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FOREWORD 


These  Proceedings  preserve  in  print  most  of  the  invited  addresses  and 
contributed  papers  of  the  1981  Army  Numerical  Analysis  and  Computers 
Conference.  The  Army  Mathematics  Steering  Committee  (AMSC)  sponsors  these 
meetings  on  behalf  of  the  Office  of  the  Chief  of  Research,  Development  and 
Acquisition.  Members  of  this  committee  insist  that  the  guest  lecturer  be 
internationally  known  scientists  who  are  effective  researchers  and  are  pre¬ 
sently  working  in  frontier  fields  of  current  interest.  They  feel  that  the 
addresses  by  the  invited  speakers  as  well  as  the  contributed  papers  by  Army 
personnel  will  stimulate  the  interchange  of  ideas  among  the  scientists 
attending  said  meetings. 

Under  the  date  of  15  October  1980,  Colonel  Robert  J.  Feist,  Acting  Director, 
US  Army  Missile  Command,  issued  a  formal  invitation  to  hold  the  1981  con¬ 
ference  at  his  installation.  Part  of  his  letter  to  Or.  Jagdish  Chandra, 
Chairman  of  the  AMSC  is  quoted  below:. 

The  Army  Mathematics  Steering  Committee  is  invited  to  hold  its  1981 
Numerical  Analysis  and  Computers  Conference  at  the  U.  S.  Army  Missile 
Command.  The  dates  25-26  February  1981  are  suggested  as  being  a 
suitable  time  for  this  purpose. 

The  Army  Missile  Command  is  looking  forward  to  offering  this  opportunity 
for  mathematicians  and  other  scientists  doing  research  for  the  Army  to 
share  their  ideas  with  each  other  and  with  this  command. 

The  MICOM  point  of  contact  in  making  further  arrangements  is  Dr.  B.  Z. 
Jenkins,  Autovon  746-7279. 

This  is  the  second  in  this  series  of  conferences  to  have  as  its  host  the 
U.  S.  Army  Missile  Command.  The  1978  conference  was  held  at  Redstone 
Arsenal  and  had  Dr.  S.  H.  Lehnigk  as  its  Chairman  on  Local  Arrangements. 

This  year  Dr.  B.  Z.  Jenkins  served  in  this  capacity.  Both  of  these  gentle¬ 
men  are  members  of  the  AMSC,  and  both  of  them  did  an  excellent  job  of 
handling  the  many  details  needed  to  conduct  meetings  of  this  size. 

The  theme  of  this  year's  conference  was  "Mathematical  Software".  Not  only 
did  the  invited  speakers  treat  this  important  area  but  many  of  the  contri¬ 
buted  papers  emphasized  it.  Preceding  the  conference  on  24-25  February  1981 
a  tutorial  on  "Software  Reliability"  was  offered  by  Professors  B.  Littlewood 
of  the  City  University  of  London  (on  leave  at  George  Washington  University) 
and  V.  Basil i  of  the  University  of  Maryland.  Another  special  event  was  an 
evening  session  on  the  "UNIX  Operating  System"  for  certain  computers  manu¬ 
factured  by  the  Digital  Equipment  Corporation.  The  speakers  for  this 
meeting  were  Drs.  B.  Henriksen  and  Fred  Bunn  of  the  Ballistic  Research 
Laboratory,  together  with  Professor  R.  Riesenfeld  of  the  University  of  Utah. 
The  names  of  the  invited  speakers  and  the  titles  of  their  addresses  are 
noted  on  the  following  page. 
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OBSERVATIONS  ON  THE  MATHEMATICAL 
SOFTWARE  EFFORT 

ALGEBRAIC  COMPUTATION 


NUMERICAL  SOFTWARE  FOR  FIXED  POINT 
MICROPROCESSOR  APPLICATIONS  AND  FOR 
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The  success  of  this  conference  was  due  to  many  scientists,  including  the 
active  and  enthusiastic  'members  of  the  audience,  the  chairpersons  of  the 
sessions  and  the  authors  of  the  papers.  The  members  of  the  AMSC  would  like 
to  thank  these  gentlemen  for  taking  time  to  prepare  papers  for  these  pro¬ 
ceedings  so  that  persons  unable  to  attend  this  symposium  can  profit  by  their 
contributions  to  the  scientific  literature. 
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Abstract.  John  Rice  introduced  the  term  ’mathematical  software1  in  1969  to 
denote  "the  set  of  algorithms  in  the  area  of  mathematics . "  Beginning  with  a 
meeting  at  Purdue  University  the  following  year,  work  on  mathematical  software 
has  attracted  ever  more  talented  people  and  has  steadily  gained  in  professional 
recognition.  In  this  paper  we  discuss  certain  milestones,  both  successes  and 
disappointments,  which  mark  this  rise.  We  also  examine  the  present  work  spec¬ 
trum  and  discuss  new  problems  arising  from  advancing  technology  and  changing 
work  patterns . 


1.  Introduction.  John  Rice  coined  the  term  ’mathematical  software'  in  1969*  and 
focussed  attention  on  the  subject  the  following  year  with  a  symposium  held  as 
part  of  a  Special  Year  in  Numerical  Analysis  at  Purdue  University.  The  movement 
spawned  by  that  first  meeting  has  been  fruitful.  In  1969  only  a  few  individuals 
worked  on  what  we  now  call  mathematical  software,  and  only  a  few  fortunate  com¬ 
puter  sites  had  access  to  decent  numerical  programs.  Today  many  talented  people 
work  in  the  field,  large  collections  of  good  numerical  software  are  widely 
available,  and  specialized  meetings  are  common. 

A  description  of  the  mathematical  software  effort  is  difficult  because  it 
is  so  broad.  Its  domain  is  that  nebulous  region  between  the  discovery  of  numer¬ 
ical  algorithms  and  the  consumption  of  numerical  software.  On  the  one  hand 
numerical  analysts  devise  new  computational  methods,  and  on  the  other  hand  indi¬ 
viduals  wish  to  apply  effective  methods  to  their  immediate  problems.  It  is  the 
job  of  the  mathematical  software  effort  to  bridge  the  gap  by  packaging  numerical 
analysts*  work  in  software  appealing  to  the  consumer.  Strictly  speaking,  work 
on  mathematical  software  is  limited  to  tasks  related  to  the  implementation  of 
numerical  algorithms.  In  practice  the  spectrum  of  activities  is  surprisingly 
wide  because  the  process  of  implementation  is  itself  worthy  of  study.  In  addi¬ 
tion  to  obvious  concerns  with  program  design  and  testing,  there  are  major  con¬ 
cerns  with  programming  practices,  documentation  standards,  software  organization 
and  distribution  methods.  Other  activities  involve  the  development  of  program¬ 
ming  tools  to  partially  automate  design,  implementation,  testing  and  maintenance 
of  software,  and  work  on  the  computational  environment,  including  the  design  of 
arithmetic  systems  and  programming  languages  properly  supportive  of  good  numeri¬ 
cal  software.  Major  contributions  have  been  made  in  each  of  these  areas  by 
individuals  who  consider  their  primary  interest  to  be  mathematical  software. 

The  published  proceedings  of  the  Purdue  meeting  contain  Rice's  appraisal  of 
the  mathematical  software  effort  as  it  stood  in  1970  [45],  including  a  chrono¬ 
logical  account  of  progress*  This  paper  is  a  similar  appraisal  of  the  effort  as 
it  stands  today.  Instead  of  updating  the  chronological  record,  however,  we  dis¬ 
cuss  what  we  consider  to  be  major  milestones  marking  progress  to  this  point.  We 
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then  examine  current  problems  in  the  field  and  future  challenges  posed  by  an 
advancing  technology.  This  work  was  inspired  by  a  panel  session  on  the  same 
subject  that  ended  the  week-long  International  Seminar  on  Problems  and  Metho¬ 
dologies  in  Mathematical  Software  Production  held  this  past  November  in 
Sorrento,  Italy,  under  the  sponsorship  of  The  University  of  Naples  and  the 
C.N.R.  We  gratefully  acknowledge  the  contributions  of  our  fellow  panelists,  B. 
Ford,  T«  J,  Dekker,  M.  Gentleman,  J.  N,  Lyness  and  P.  C.  Messina,  and  a  respon¬ 
sive  audience.  With  the  benefit  of  leisurely  reflection  we  have  reorganized  and 
expanded  some  of  their  ideas  and  combined  them  with  our  own  thoughts  on  the 
matter.  We  alone  are  responsible  for  the  selection  and  expression  of  the  opin¬ 
ions  that  follow,  however. 

The  reader  should  be  aware  that  the  views  presented  below  may  be  colored  by 
personal  bias,  and  that  other  views  exist.  The  surveys  and  suggestions  for 
future  research  in  [24,28,39,47,48]  are  especially  recommended  to  the  interested 
reader . 


2.  The  Past.  Many  people  associate  the  beginning  of  the  mathematical  software 
effort  with  Rice’s  1969  call  for  a  meeting  at  Purdue  University  [44].  The  roots 
go  back  further,  however.  While  it  would  be  trite  to  trace  them  to  the  first 
numerical  subroutine  libraries,  we  detect  an  emerging  concern  for  software  qual¬ 
ity  in  the  early  1960's.  By  then  individuals  at  The  University  of  Toronto,  The 
University  of  Chicago,  Stanford  University,  Bell  Laboratories  and  Argonne 
National  Laboratory  were  critically  examining  software  and  advertising  their 
findings  through  technical  reports  and  discussions  at  computer  user  group  meet¬ 
ings.  The  ideas  and  evaluation  techniques  were  not  well  enough  established  for 
publication  in  refereed  journals,  however,  and  efforts  were  hampered  by  poor 
communications.  Often  workers  at  one  location  were  completely  unaware  of  simi¬ 
lar  work  elsewhere.  Yet  each  of  these  computing  centers  developed  outstanding 
program  libraries  by  contemporary  standards. 

In  early  1966  J,  F.  Traub  organized  SICNUM,  the  Special  Int  erest  Committee 
on  Numerical  Mathematics.  The  group  grew  quickly,  and  by  midyear  when  the  first 
informal  SICNUM  Newsletter  appeared  it  had  a  membership  of  almost  1000.  Two 
articles  in  the  first  Newsletter  typify  SICNUM’ s  interests.  The  first  announced 
the  establishment  of  a  working  group  ”to  investigate  testing  and  certification 
techniques  for  numerical  subroutines,”  and  the  second  announced  a  SICNUM  spon¬ 
sored  evening  session  at  the  1966  National  ACM  Conference.  The  session  included 
a  panel  discussion  ”in  the  area  of  machine  implementation  of  numerical  algo¬ 
rithms.  By  constantly  emphasizing  efforts  to  improve  the  quality  of  numerical 
software,  SICNUM  and  its  successor  SIGNUM  set  the  stage  for  Rice’s  1969  call  for 
a  symposium. 

In  his  call  Rice  defined  mathematical  software  as  ’’computer  programs  which 
implement  widely  applicable  mathematical  procedures’’  [44] .  This  contrasts  with 
the  definition  he  later  included  in  the  published  proceedings,  ’’the  set  of  algo¬ 
rithms  in  the  area  of  mathematics”  [45].  These  two  definitions  illustrate  the 
fundamental  confusion  between  algorithms  and  computer  programs  that  plagued  the 
early  development  of  numerical  software.  The  realization  that  an  implementation 
is  different  from  the  underlying  algorithm  marks  the  emergence  of  mathematical 
software  as  a  separate  field  of  endeavor. 
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That  difference  was  not  widely  understood  in  1969,  Despite  early  admoni¬ 
tions  from  G.  Forsythe  [22]  and  other  prominent  researchers,  most  numerical 
analysts  still  believed  that  their  work  was  finished  when  they  had  defined  an 
algorithm.  Computer  programming  was  a  job  for  computer  programmers;  numerical 
analysts  only  programmed  when  it  was  necessary  for  their  research  (and  pure 
mathematicians  never  programmed) .  A  university  professor  seeking  advancement 
and  tenure  shied  away  from  working  on  numerical  software.  As  a  result  most  of 
the  early  work  was  concentrated  in  government  and  industrial  laboratories  with 
only  a  few  selfless  university  people  involved.  Unfortunately,  the  same  atti¬ 
tudes  are  still  common.  While  work  on  mathematical  software  has  gained  some 
professional  stature  and  there  are  more  talented  people  involved  in  the  effort 
today  than  were  involved  in  1969,  many  others  still  do  not  dare  to  become 
involved  if  they  seek  promotion.  This  is  still  especially  true  at  many  univer¬ 
sities  . 

Three  software  projects  that  greatly  influenced  the  mathematical  software 
effort  began  about  the  time  of  the  Purdue  meeting.  Each  project,  IMSL,  NAG  and 
NATS,  resulted  in  a  widely-used  collection  of  high-quality  numerical  software. 
Certain  software  collections  were  publically  available  before  this.  Computer 
user's  groups  had  organized  program  repositories  by  the  early  1960* s,  and  the 
IBM  Scientific  Subroutine  Package  (SSP)  was  available  on  the  IBM  7094,  for  exam¬ 
ple.  Although  these  collections  contained  a  few  good  programs,  their  general 
reputations  were  deservedly  notorious.  The  IMSL,  NAG  and  NATS  collections  were 
the  first  to  combine  quality  with  wide  distribution. 

IMSL,  International  Mathematical  and  Statistical  Libraries,  Inc.,  was 
founded  in  1971  by  some  of  the  people  involved  in  the  IBM  SSP  effort.  It 
delivered  the  first  purely  commercial  numerical  subroutine  library  to  IBM  custo¬ 
mers  a  year  later.  By  mid  1973  the  library  had  also  been  delivered  to  UNIVAC 
and  CDC  customers,  and  for  the  first  time  the  same  library  of  numerical  programs 
became  available  on  a  variety  of  computing  equipment.  This  enabled  numerical 
programmers  to  write  and  distribute  applications  programs  without  worrying  about 
the  availability  of  a  decent  support  library.  Today  IMSL  supports  most  major 
computers.  The  success  of  this  venture  is  suggested  by  the  number  of  computing 
centers  now  relying  on  IMSL  and  its  competitors  for  their  core  library,  thus 
freeing  local  personnel  to  develop  the  specialized  programs  necessary  for  their 
own  work. 

IMSL!s  main  competition  comes  from  the  NAG  and,  to  a  lesser  extent,  PORT 
libraries.  NAG,  originally  Nottingham  Algorithms  Group  but  now  Numerical  Algo¬ 
rithms  Group,  was  organized  about  1970  in  Great  Britain  as  a  cooperative  venture 
between  universities  using  ICL  1906A  computers.  Supported  by  heavy  government 
subsidies,  NAG  extended  its  coverage  to  other  machines  and  now  seeks  to  become 
self-supporting.  The  PORT  library  is  a  product  of  Western  Electric  arising  from 
the  early  library  work  at  Bell  Laboratories.  It  is  not  aggressively  marketed, 
and  is  therefore  not  as  widely  used  as  the  IMSL  and  NAG  libraries. 

The  NATS  project,  National  Activity  to  Test  Software,  was  conceived  in  1970 
and  funded  in  1971  by  the  National  Science  Foundation  and  the  Atomic  Energy  Com¬ 
mission  to  study  problems  in  producing,  certifying,  distributing  and  maintaining 
quality  numerical  software  [5].  This  was  a  cooperative  effort  between  personnel 
at  Argonne  National  Laboratory,  Stanford  University,  The  University  of  Texas  at 
Austin  and  scattered  test  sites  to  examine  software  production  as  a  research 


3 


problem*  Intrinsic  to  this  effort  was  the  production  of  two  software  packages, 
the  EISPACK  collection  of  matrix  eigensystem  programs  and  the  FUNPACK  collection 
of  special  function  programs.  The  project  formally  ended  with  the  distribution 
of  extended  second  releases  of  both  packages  in  1976  [11,23,51], 

By  any  measure,  the  NATS  project  was  a  spectacular  success.  Not  only  did 
it  produce  superior  software,  but  it  also  pioneered  in  organizational  and  techn¬ 
ical  achievements  that  are  still  being  exploited.  For  example,  the  project 
developed  an  early  system  for  automated  program  transformation  and  maintenance 
[50]  that  led  directly  to  current  research  on  the  TAMPR  system  [6] .  We  believe 
that  the  NATS  aids  were  developed  before  similar  aids  for  program  transformation 
were  developed  at  JPL  [31]  and  within  the  IMSL  [3]  and  NAG  [19]  projects.  They 
were  certainly  the  first  to  be  successfully  used  in  a  software  project.  Impor¬ 
tant  as  such  technical  achievements  were,  however,  they  were  overshadowed  by  the 
organizational  concepts  the  project  developed.  Machura  and  Sweet  recently 
stated  [36],  T,The  most  important  lesson  learned  from  the  EISPACK  project  is  that 
the  development  and  distribution  of  quality  software  can  be  achieved  by  the 
joint  efforts  of  several  different  organizations,11  Before  the  NATS  success 
software  was  typically  developed  with  the  limited  resources  of  one  organization; 
since  the  NATS  success  cooperative  ventures  have  become  common. 

None  of  this  would  have  mattered  if  the  NATS  software  had  not  been  supe¬ 
rior.  Fortunately,  the  software  produced  by  the  project  was  well  received  and 
is  still  considered  to  be  some  of  the  best  available.  EISPACK,  in  particular, 
set  and  met  high  standards  for  performance,  transportability  and  documentation. 
It  has  become  a  paradigm  for  thematic  numerical  software  collections  with  the 
term  'PACK'  now  implying  all  that  is  good  in  numerical  software.  Attesting  to 
EISPACK1 s  influence,  the  following  PACKs  in  addition  to  FUNPACK  either  exist  or 
are  in  advanced  planning  stages:  ELLPACK  [46],  FISHPAK  [2],  ITPACK  [26],  LINPACK 
[17],  MINPACK  [37],  PDEPACK  [42],  QUADPACK  [43],  ROSEPACK  [14],  SPARSPAK  [25], 
TESTPACK  [9]  and  T00LPACK  [40].  While  many  of  these  are  superb  packages,  the 
use  of  a  ’PACK1  name  does  not  automatically  instill  quality. 

It  is  disappointing  that  the  NATS  experience  was  not  fully  exploited. 
Attempts  to  establish  a  central  organization  for  software  production  based  on 
the  NATS  concept  [15,16]  failed  for  various  political  and  technical  reasons. 
This  denied  segments  of  the  numerical  software  community  access  to  experienced 
people  and  important  resources.  Many  of  the  projects  mentioned  above  had  to 
rely  on  their  own  resources  to  coordinate  production,  certification  and  distri¬ 
bution  of  their  software,  duplicating  similar  capabilities  already  developed  in 
other  projects. 

The  first  Purdue  symposium  was  followed  by  two  other  important  meetings. 
SIGNUM  sponsored  a  meeting  in  1971  in  Ljubljana,  Yugoslavia,  concurrent  with  the 
1971  IFIP  Congress,  that  ultimately  led  to  the  establishment  in  late  1974  of  WG 
2.5,  the  IFIP  Working  Group  on  Numerical  Software.  Members  of  the  working  group 
now  represent  numerical  software  interests  in  language  and  hardware  standardiza¬ 
tion  efforts,  often  with  detailed  advice  from  the  group  as  a  whole.  Tn  addition 
the  working  group  has  organized  several  international  workshops  on  software 
topics  and  has  drafted  and  published  several  technical  reports. 

The  other  important  meeting  was  the  second  mathematical  software  symposium 
held  at  Purdue  in  1974.  While  its  influence  was  not  as  great  as  that  of  the 
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first  meeting,  it  did  lead  to  the  establishment  of  the  ACM  Transactions  on 
Mathematical  Software  with  John  Rice  as  editor.  Since  its  appearance  in  early 
1975  with  papers  from  the  Purdue  meeting,  TOMS  has  complemented  the  SIGNUM 
Newsletter  by  providing  an  outlet  for  refereed  numerical  software  papers. 

The  second  Purdue  meeting  was  also  noteworthy  for  the  first  open  discussion 
of  the  BLAs,  or  Basic  Linear  Algebra  Subprograms  [32].  As  the  name  implies,  the 
BLAs  are  a  collection  of  Fortran  subprograms  implementing  low  level  operations, 
such  as  the  dot  product,  from  linear  algebra.  The  project  was  originally  organ- 
ized  in  1973  as  a  private  effort  to  reach  consensus  on  names,  calling  sequences 
and  functional  descriptions  for  such  programs,  but  it  quickly  became  a  coopera¬ 
tive  effort  officially  sanctioned  by  ACM- SIGNUM,  Once  conventions  had  been 
agreed  on,  it  was  possible  for  linear  algebra  programs  to  do  fundamental  opera¬ 
tions  in  a  uniform  way.  This  was  already  a  significant  accomplishment,  but  the 
group  also  prepared  efficient  implementations  of  the  BLAs  routines  for  most 
popular  computers.  The  projects  most  important  contribution,  however,  was  the 
concept  of  establishing  popular  Conventions1  as  opposed  to  official  standards. 
Language  designers  are  reluctant  to  augment  standard  languages  to  include  some¬ 
thing  useful  to  only  a  small  group.  Even  if  that  is  done,  years  pass  before  the 
new  feature  is  available  in  compilers.  The  establishment  of  private  conventions 
outside  language  standards  is  a  more  reasonable  approach,  and  the  BLAs  project 
demonstrates  that  it  is  also  a  practical  one.  As  with  NATS,  this  lead  has  not 
been  fully  exploited. 

It  is  difficult  to  assess  the  importance  of  events  in  the  immediate  past, 
but  we  believe  that  the  recently  proposed  IEEE  standard  for  floating-point 
arithmetic  will  prove  to  be  important.  One  major  disappointment  in  numerical 
work  has  been  the  general  lack  of  progress  in  designing  clean  computer  arith¬ 
metic  systems.  High  quality  software  is  supposed  to  be  fail-safe  and  transport¬ 
able;  it  must  work  properly  regardless  of  quirks  in  the  host  arithmetic  system. 
Software  production  is  seriously  hampered  when  computer  arithmetic  violates  sim¬ 
ple  arithmetic  properties  such  as 

1.0  *  X  =  X 
X  *  Y  =  Y  *  X, 

and 

X  +  X  =  2.0  *  X, 


There  exist  mainframes  of  recent  design  in  which  each  of  these  properties  fails 
for  appropriate  floating-point  X  and  Y.  Worse  yet,  on  some  machines  there  exist 
floating-point  X  >  0.0  such  that 


or 


1.0  *  X  =  0.0, 

X  +  X  =  0.0, 

[sqrt(X)]2  =  overflow  or  underflow. 


All  these  anomalies  are  traceable  to  engineering  economies  [12].  Computer 
designers  repeatedly  ignore  complaints  about  such  mathematical  atrocities,  and 
new  anomalies  seem  to  appear  with  each  new  machine. 


That  may  be  changing  however.  By  1977  technology  had  advanced  to  the  point 
where  small  microprocessor  manufacturers  considered  adding  floating-point 
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arithmetic  to  their  chips*  In  an  unprecedented  move  they  turned  to  numerical 
analysts  for  advice.  The  result  was  the  formation  of  a  special  subcommittee  of 
the  IEEE  Computer  Society  to  draft  a  standard  for  binary  floating-point  arith¬ 
metic*  The  draft  they  produced  [1]  is  radically  different  from  existing  arith¬ 
metic  systems.  Not  only  is  it  free  of  anomalies,  but  it  also  contains  new 
features  specifically  requested  and  designed  by  numerical  analysts  with  software 
experience*  The  first  chips  based  on  this  proposed  standard  have  now  appeared 
[29],  and  the  first  microcomputers  are  being  delivered  [30], 

These,  then,  are  the  milestones  leading  to  where  we  stand  today:  the  early 
work  at  isolated  computing  centers,  the  establishment  of  SICNUM,  the  two  Purdue 
Symposia,  the  establishment  of  commercial  numerical  software  libraries,  the  NATS 
project  and  the  EISPACK  package,  the  establishment  of  IFIP  WG  2.5  and  of  TOMS, 
the  BLAs,  and  the  drafting  of  a  standard  for  floating-point  arithmetic.  Each  of 
these  events  added  something  new  and  important  to  the  movement.  There  have  also 
been  some  disappointments.  We  mention  in  particular  the  failure  to  achieve  full 
professional  recognition  for  software  work,  especially  at  universities,  the 
failure  to  fully  exploit  the  NATS  experience,  and  the  general  lack  of  progress 
in  mainframe  arithmetic  design. 


3.  The  Present-  While  the  problems  we  face  today  are  similar  to  those  we  faced 
ten  years  ago,  the  solutions  have  become  more  complicated.  We  are  still  con- 
cerned  about  the  production  of  high-quality  transportable  software,  but  we 
expect  more  from  such  software  now  than  we  did  in  the  past.  Therefore  it  is 
more  difficult  to  produce. 

The  last  section  pointed  to  many  thematic  numerical  software  packages. 
Some  such  as  EISPACK  and  UNPACK  are  complete,  while  others  such  as  MINPACK  and 
QUADPACK  are  still  under  development.  We  believe  it  is  significant  that  most  of 
the  early  success  involved  linear  algebra  programs.  It  is  true  that  linear 
algebra  is  a  fundamental  mathematical  tool  for  other  problem  areas,  such  as 
optimization  and  partial  differential  equations,  and  that  good  software  for 
these  other  problems  was  not  likely  to  be  produced  until  good  linear  algebra 
programs  were  ready.  But  it  is  also  true  that  linear  algebra  had  reached  an 
algorithmic  maturity  that  invited  software  production.  The  algorithms  were  well 
developed,  well  understood  and  backed  by  error  analysis  that  clearly  displayed 
the  limitations  of  software  implementations.  Because  the  production  of  EISPACK 
required  minimal  algorithmic  work  the  producers  could  concentrate  on  recasting 
algorithms  to  enhance  desirable  software  attributes.  The  effort  thus  produced  a 
significant  software  package  within  three  years  of  funding.  In  contrast,  the 
MINPACK  effort  required  about  five  years  to  produce  its  first  small  package. 
This  lengthy  development  time  reflects  the  difficulty  of  the  task  and  is  likely 
to  be  typical  of  future  projects.  As  in  many  other  fields,  prominent  research¬ 
ers  in  optimization  do  not  agree  on  the  best  algorithms;  new  methods  frequently 
appear  accompanied  by  confusing  claims  of  superiority  over  existing  methods  and 
programs.  The  situation  is  common  in  a  vigorous,  dynamic  research  field,  but  it 
does  not  encourage  the  quick  production  of  high-quality  software.  All  the 
?easyT  implementations  may  have  been  done  already. 

Despite  these  difficulties,  we  believe  that  some  additional  problem  areas 
could  be  harvested  for  software  now.  We  are  frankly  puzzled  by  the  lack  of  an 
effort  in  ordinary  differential  equations,  for  example.  Existing  algorithms 
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seem  to  be  well  enough  understood,  but  no  group  has  emerged  with  the  necessary 
dedication  and  support. 

There  is  one  other  little-understood  aspect  of  successful  numerical 
software  projects  that  we  believe  to  be  important.  Part  of  the  variation  in 
quality  in  the  numerous  PACKs  previously  mentioned  is  due  to  an  improper  appre¬ 
ciation  of  a  fundamental  lesson  from  the  NATS  project.  We  stated  above  that 
linear  algebra  was  in  a  good  algorithmic  position  when  the  EISPACK  work  began. 
That  does  not  mean  the  field  was  stagnant,  however;  new  algorithms  were  being 
introduced.  The  project  deliberately  ignored  new  work  because  it  felt  that 
algorithms  had  to  prove  themselves  before  being  included.  Further,  the  project 
found  that  there  is  a  one  to  two  year  delay  between  the  completion  of  the  first 
pass  at  software  and  its  final  release.  This  time  is  spent  iteratively  testing, 
revising  and  documenting  to  insure  that  the  package  does  what  it  claims.  Thus 
there  must  be  a  one  to  two  year  moratorium  on  the  introduction  of  new  material 
into  the  package.  This  simple  discovery  has  far-reaching  implications.  Algo¬ 
rithmic  researchers  find  it  almost  impossible  to  observe  such  a  moratorium;  they 
are  intent  on  wide  distribution  of  their  latest  discoveries.  Further,  they  can¬ 
not  effectively  polish  software  they  feel  to  be  inferior.  Therefore,  control 
over  software  projects  should  be  vested  in  individuals  who  understand  and  are 
dedicated  to  software  production  rather  than  in  individuals  who  primarily  pro¬ 
duce  algorithms.  Algorithm  producers  should  be  involved  in  software  packaging, 
but  they  should  not  control  it. 

There  is  another  advantage  to  this  approach.  Software  packages  require  a 
uniformity  of  style  to  simplify  documentation  and  maintenance.  As  EISPACK 
demonstrated,  different  programs  may  contain  large  segments  of  code  that  can  be 
rendered  almost  identical,  e.g,,  by  using  similar  variable  names  and  identical 
labels.  The  elements  of  a  package  also  must  adopt  a  uniform  philosophy  for 
detecting  and  reporting  errors.  The  necessary  surgery  to  produce  package  uni¬ 
formity  is  best  done  by  someone  with  no  particular  attachment  to  the  original 
programs . 

Aside  from  algorithmic  development,  the  most  difficult  problem  facing  us 
today  is  testing.  There  are  two  fundamentally  different  reasons  for  testing, 
hence  two  fundamentally  different  approaches.  On  the  one  hand  algorithm  crea¬ 
tors  want  to  show  that  their  creations  are  in  some  way  superior  to  existing 
algorithms,  and  they  approach  performance  testing  as  a  contest.  The  tests  they 
design  specifically  highlight  whatever  advantage  the  new  algorithm  may  have; 
there  is  usually  no  attempt  to  uncover  weaknesses  in  the  algorithm  or  its  imple¬ 
mentation. 

On  the  other  hand,  the  selection  of  software  for  general  use  requires  com¬ 
plete  performance  evaluation.  Usually  some  duplication  of  purpose  is  acceptable 
in  building  a  library,  for  example,  so  the  concern  is  more  with  eliminating 
unacceptable  programs  and  in  matching  programs  to  problem  characteristics  than 
in  determining  the  'best*  program.  Tests  for  this  purpose  should  aggressively 
exercise  a  program  in  ways  that  will  detect  weaknesses,  display  strengths, 
explore  robustness  and  probe  problem-solving  ability.  We  liken  this  type  of 
testing  to  a  physical  examination.  Inevitably  the  results  of  such  testing  will 
be  used  to  compare  programs,  but  the  original  intent  is  that  a  program  be  exam¬ 
ined  in  isolation  to  stand  or  fall  on  its  own  merits. 
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Designing  and  implementing  test  programs  is  an  important  numerical  problem 
that  has  been  neglected  in  the  rush  to  produce  software  for  other  purposes. 
Software  testing  locates  weaknesses  and  leads  to  improvements  in  the  next 
software  generation.  Yet,  except  for  the  ELEFUNT  package  of  transportable  For- 
tran  test  programs  for  the  elementary  functions  [13]  and  collections  of  test 
programs  for  optimization  software  [9,38],  no  thematic  test  packages  exist  to 
our  knowledge.  Some  test  materials  are  distributed  with  various  PACKs  mentioned 
earlier,  but  these  are  not  intended  for  general  use. 

The  trouble  is  that  we  know  little  about  how  to  test  most  types  of 
software.  Accuracy  tests,  for  example,  are  usually  battery  tests  exercising 
programs  on  someone’s  haphazard  collection  of  problems.  Not  only  is  this  time- 
consuming,  but  there  is  little  purpose  behind  what  is  done  and  the  mass  of  data 
gathered  may  be  incomprehensible  even  to  those  who  gathered  it.  We  must  find  a 
better  way.  We  must  back  off  from  the  problem  and  critically  examine  what  we 
are  doing;  every  test  should  have  a  purpose.  We  must  find  understandable  and 
useful  ways  to  present  test  results.  (Note  in  this  regard  the  clever  use  of 
Chernoff  faces  [10]  to  summarize  evaluations  of  software  for  solving  systems  of 
nonlinear  equations  [27]). 

There  are  some  leads  in  the  literature  that  may  prove  useful.  J.  Lyness 
and  J.  Kaganove  show  that  numerical  software  falls  into  two  broad  classes  [34]. 
Class  1  (precision  bound)  programs  implement  methods,  called  ’finite  decision 
methods1,  that  guarantee  to  produce  results  in  a  finite  number  of  steps.  Ele¬ 
mentary  function  programs  are  examples  of  class  1  programs.  The  accuracy 
achieved  in  class  1  programs  usually  approaches  limits  imposed  by  the  computer 
arithmetic  system.  All  other  programs  are  class  2  (heuristic  bound)  programs 
implementing  ’unreliable  exact  arithmetic  algorithms'.  The  algorithms  are  such 
that  useful  results  are  not  guaranteed  in  a  finite  number  of  steps  even  with 
exact  arithmetic.  Results  that  are  produced  are  usually  limited  in  accuracy  by 
the  algorithm  and  not  by  machine  arithmetic.  Quadrature  and  optimization  pro¬ 
grams  are  usually  class  2. 

The  importance  of  this  classification  is  that  while  accuracy  test  results 
for  class  1  programs  vary  with  the  operating  system,  compiler  and  machine,  prop¬ 
erly  structured  accuracy  tests  for  class  2  programs  produce  system-independent 
results  when  the  accuracy  achieved  is  sufficiently  above  machine  limits.  Thus 
certain  types  of  accuracy  tests  for  class  2  software  need  be  done  only  once  and 
only  on  one  system. 

But  accuracy  testing  is  just  part  of  a  complete  test  package;  efficiency 
and  robustness  are  also  important.  Because  class  2  programs  frequently  require 
user-supplied  software  with  an  unpredictable  effect  on  timing,  other  measures  of 
efficiency,  such  as  the  number  of  accesses  to  the  user-supplied  program,  must  be 
used.  Where  efficiency  varies  significantly  from  problem  to  problem,  it  is 
important  to  explore  efficiency  as  a  function  of  the  problem  space.  In  its  most 
elegant  form  to  date,  efficiency  testing  has  been  combined  with  accuracy  testing 
and  parameterization  of  a  problem  space  to  produce  'performance  profiles’.  The 
prototype  work  on  automatic  quadrature  programs  [35]  produced  curves  combining 
probability  of  success  and  expected  number  of  integrand  evaluations  as  functions 
of  requested  accuracy  for  specific  parameterized  problem  families.  Curves  for  a 
problem  family  with  features  similar  to  those  in  a  particular  application  should 
be  useful  in  selecting  a  program  for  that  application  based  on  balancing 
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requested  accuracy  and  predicted  cost  against  the  probability  of  success.  The 
concepts  of  software  classification  and  performance  profiles  exemplify  the 
abstract  assault  on  evaluation  procedures  that  we  believe  is  essential  to  pro¬ 
gress  in  this  area.  Except  for  [33],  these  ideas  have  not  been  exploited  beyond 
the  work  cited. 

Concerns  for  numerical  software  have  spawned  important  work  in  other  fields 
as  well.  For  example,  research  on  the  TAMPR  system  [6]  for  automated  program 
transformation  and  maintenance  was  specifically  motivated  by  early  NATS  work. 
TAMPR  is  intended  to  accept  programs  in  certain  standard  languages,  map  them 
into  abstract  forms,  make  transformat ions  on  these  abstract  forms,  and  finally 
recover  specific  realizations  of  the  transformed  programs  in  standard  languages 
again.  The  transformations  are  limited  conceptually  only  by  our  ability  to 
describe  what  must  be  done.  An  early  version  of  the  system  was  used  to  realize 
all  versions  of  the  UNPACK  programs  from  complex  single  precision  prototypes, 
for  example.  This  application  included  enforcing  formatting  conventions  and 
selectively  implanting  either  calls  to  BLAs  routines  or  inline  coding  with  BLAs 
functionality,  depending  on  the  particular  target  computer  host.  Ultimately  the 
capabilities  may  include  automatic  translation  from  one  programming  language  to 
another  by  simply  specifying  different  source  and  target  languages  in  the  first 
and  last  steps. 

TAMPR  is  only  one  of  many  useful  tools  now  under  development.  The  TOOLPACK 
project  is  working  on  an  extensive  collection  of  software  tools  specifically 
designed  to  simplify  the  writing,  testing,  analyzing  and  maintaining  of  numeri¬ 
cal  software.  The  package  is  to  combine  the  capabilities  of  TAMPR  with  those  of 
formatters  like  POLISH  [18],  static  analyzers  like  DAVE  [41]  and  PFORT  [49], 
dynamic  analyzers  like  NEWTON  [20] ,  and  other  as  yet  unspecified  tools  including 
text  editors.  Specification  of  the  package  is  still  incomplete,  but  there  is 
agreement  that  the  package  will  be  portable  and  that  package  elements  will  be 
compatible  in  data  requirements.  Release  of  a  prototype  version  for  evaluation 
and  comment  is  tentatively  set  for  late  1982. 

We  earlier  mentioned  the  work  of  the  IEEE  on  standardization  of  binary 
floating-point  arithmetic  for  microprocessors .  That  is  only  one  instance  of  a 
wide  concern  for  computer  arithmetic.  The  IEEE  has  recently  established  a 
second  subcommittee  to  draft  a  radix  and  format -independent  floating-point  stan¬ 
dard  that  will  be  upward  compatible  with  the  previous  effort.  Although  the  new 
draft  is  again  intended  for  microprocessors,  its  inclusion  of  non-binary  arith¬ 
metics  should  interest  designers  of  larger  equipment. 

The  fruits  of  such  standardization  efforts  will  not  become  widely  available 
for  some  time,  however.  In  the  meantime  we  are  forced  to  write  software  for 
existing  computers.  We  can  improve  the  portability  of  software  among  such 
machines  by  explicitly  including  environmental  dependencies  in  the  source  code. 
There  have  been  several  attempts  to  establish  a  fundamental  set  of  parameters 
describing  arithmetic  systems  for  this  purpose.  IFIP  WG  2.5  published  one  pro¬ 
posal  [21]  that  has  proven  unsatisfactory  in  many  respects  and  has  not  been 
Widely  used.  A  second  proposal  [8]  related  to  Brown's  model  for  floating-point 
arithmetic  [7]  has  received  important  support  in  some  areas.  The  entire  arith¬ 
metic  model  is  imbedded  in  ADA  [52],  for  example,  much  to  the  consternation  of 
some  numerical  analysts.  We  return  to  that  in  a  moment.  Still  a  third  proposal 
is  being  considered  by  the  ANSI  X3J3  Fortran  Standards  Committee  for  inclusion 
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in  the  next  Fortran  standard.  This  proposal  defines  certain  parameters  and 
reserves  their  names  in  the  same  way  that  SIN  is  a  reserved  name.  The  parameter 
names  are  then  aliases  for  numerical  values  appropriate  to  the  particular  host 
environment.  The  difference  between  this  approach  and  the  ADA  approach  is  that 
here  only  the  names  are  specified;  the  numerical  values  provided  are  implementa¬ 
tion  dependent.  While  the  parameters  are  based  on  a  model  of  an  arithmetic  sys¬ 
tem,  the  model  is  not  imposed  by  the  standard.  Thus  the  details  of  the  model 
used  in  a  particular  situation  can  be  chosen  to  fit  the  circumstances.  When 
portability  is  crucial  the  model  can  be  chosen  to  conservatively  estimate 
machine  parameters;  when  local  performance  is  important  the  model  can  be  chosen 
to  closely  approximate  the  local  system.  Such  flexibility  is  not  available  in 
the  ADA  approach  where  the  model  specified  must  be  conservative  to  be  universal. 

The  activities  and  concerns  just  outlined  are  typical  of  the  mathematical 
software  effort  today.  Several  large  software  projects  are  underway;  others  are 
planned.  There  are  many  ancillary  activities  aimed  at  improving  the  environment 
for  software  production  and  use.  But  there  are  also  difficult  problems  that  are 
not  being  addressed.  We  are  not  making  much  progress  in  testing  methodology, 
for  example. 


4.  The  Future.  Prediction  of  the  future  is  always  risky.  Nevertheless  we 
present  a  few  guesses  at  what  lies  ahead.  We  expect  that  the  quantity  and  qual¬ 
ity  of  numerical  software  will  continue  to  increase  and  that  the  activities  just 
described  will  flourish  in  the  future.  Advancing  technology  and  even  the 
present  success  of  the  numerical  software  effort  pose  problems  that  must  be 
overcome ,  however . 

The  most  significant  problem  we  face  plagues  every  technical  field  and  has 
been  with  us  for  a  long  time  -  communications  at  all  levels.  As  we  become  more 
specialized  we  lose  touch  with  one  another  and  especially  with  potential  custo¬ 
mers  . 


Good  communications  with  customers  is  crucial.  Superb  software  is  worth¬ 
less  unless  software  consumers  are  persuaded  to  use  it.  It  is  not  enough  to 
make  users  aware  of  software  existence,  though  that  is  a  difficult  task  in 
itself;  consumer  lethargy  must  be  overcome  at  the  same  time.  Consumers  are 
reluctant  to  modify  running  programs  unless  they  are  convinced  that  the  software 
they  are  currently  using  is  inferior  enough  to  endanger  their  work  and  that  the 
new  software  will  remove  that  danger.  Open  literature  publications  have  never 
solved  this  type  of  communications  problem.  The  consumers  we  must  reach  are 
applications  people  who  do  not  read  numerical  analysis  or  mathematical  software 
literature.  We  must  find  other  ways  to  reach  them. 

Several  years  ago  both  the  Albuquerque  and  Livermore  branches  of  Sandia 
Laboratories  inserted  library  monitors  in  their  operating  systems  [4] .  These 
monitors  provided  information  on  who  was  using  which  routines  and  on  the  values 
of  certain  parameters  in  the  initial  calls  to  those  routines.  This  information 
proved  valuable  to  both  the  librarians  and  the  users.  It  led  to  improvement  of 
frequently  used  programs  and  provision  of  new  special  purpose  programs  for  prob¬ 
lems  previously  solved  with  general  purpose  routines.  It  also  permitted  per¬ 
sonal  contact  when  it  appeared  that  a  program  was  being  misused,  when  program 
bugs  were  found,  or  when  better  programs  became  available.  Of  course,  diplomacy 
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and  tact  were  essential  in  these  contacts.  In  a  few  cases  users  objected  when 
they  felt  their  privacy  was  being  invaded  or  they  did  not  appreciate  proffered 
advice.  Sandia  Livermore  Laboratories  augmented  personal  contact  with  an 
advertising  campaign  in  which  new  programs  were  featured  on  posters  prominently 
placed  in  all  terminal  rooms.  Such  efforts  are  noteworthy,  rare  and  insuffi¬ 
cient  . 

Today  we  face  a  revolution  in  the  way  computers  are  being  used.  The  small 
’personal*  computer  is  becoming  common  at  Argonne,  and  elsewhere  as  well,  we 
suspect.  While  it  is  often  acquired  for  monitoring  experiments  and  gathering 
data,  the  temptation  to  use  it  for  numerical  purposes  is  strong.  This  is  espe¬ 
cially  true  when  the  cost  of  using  a  central  computing  facility  grows  and  the 
’free’  personal  machine  would  otherwise  sit  idle.  Such  usage  is  not  necessarily 
bad,  because  smaller  machines  are  approaching  the  hardware  capabilities  of 
larger  machines  of  only  a  few  years  ago.  Software  is  the  problem.  Owners  of 
such  machines  frequently  write  their  own  software  or  obtain  it  from  friends.  In 
this  respect  they  operate  as  large  computing  centers  used  to  twenty  years  ago. 
The  software  movement  has  completely  lost  whatever  contact  it  may  have  had  with 
these  users  and  that  contact  will  be  difficult  to  regain. 

One  possibility  may  be  to  contribute  to  the  journals  many  of  these  people 
read.  Byte,  Personal  Computing  and  the  like  are  often  sources  of  information 
for  such  users.  While  some  of  the  articles  in  these  journals  are  written  by 
highly  qualified  people,  much  of  the  numerical  advice  is  amateurish,  reflecting 
techniques  that  lost  favor  long  ago.  We  cannot  legitimately  complain  about  this 
situation  unless  we  are  willing  as  a  profession  to  provide  the  proper  advice  and 
software  through  these  journals.  We  must  be  the  ones  to  initiate  communica¬ 
tions. 

Unfortunately,  we  are  also  losing  whatever  communication  we  had  with  users 
of  the  larger  machines.  Often  the  original  motivation  for  numerical  software 
work  was  provided  by  users  with  applications  that  were  endangered  by  poor  com¬ 
puter  programs .  As  our  effort  has  matured  many  of  us  have  become  more  concerned 
with  software  production  for  the  sake  of  production  and  less  concerned  about  the 
real  needs  of  users.  We  have  tended  to  communicate  among  ourselves  and  to 
neglect  the  users.  Perhaps  that  behavior  pattern  is  typical  of  a  new  field.  We 
hope  that  it  will  change  in  our  field. 

At  the  technical  level  we  find  challenges  posed  by  new  computer  hardware* 
We  have  only  begun  to  work  on  algorithms  and  software  for  parallel  and  vector 
machines,  and  now  we  are  faced  with  microprocessors  as  well.  Their  coming  is  an 
exciting  event  for  numerical  software  people.  The  IEEE  arithmetic  standard  pro¬ 
vides  computational  capability  that  was  not  previously  available  at  any  level. 
In  addition  to  sophisticated  handling  of  underflow  and  overflow,  standard- 
conforming  systems  must  provide  square  root  and  mod  functions,  among  others, 
that  are  as  accurate  as  the  usual  arithmetic  operations.  Some  early  implementa¬ 
tions  of  the  standard  include  square  root  in  the  hardware  where  it  becomes  no 
more  expensive  to  use  than  an  ordinary  division  operation.  This  combination  of 
speed  and  accuracy  in  square  root  coupled  with  other  features  must  influence  our 
selection  of  algorithms.  I  believe  we  will  see  dramatic  changes  in  algorithms, 
software  and  even  computer  languages  as  these  new  microprocessors  become  common. 
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Overall  we  view  the  future  with  confidence  and  expectation.  We  will  prob- 
ably  never  satisfactorily  solve  the  communications  problem,  but  we  expect  that 
the  quality  of  numerical  software  will  continue  to  improve  and  that  software 
production  will  become  easier  as  new  tools  and  hardware  appear. 
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SOFTWARE  RELIABILITY  ESTIMATION 
THROUGH  FITTING  A  POPULATION  PROCESS  TO  DATA  * 

Marc  R.  Stromberg 

Bell  Technical  Operations  Corporation 
Software/Computer  Evaluation  Facility 
600  North  Garden  Avenue 
Sierra  Vista,  Arizona  85635 

ABSTRACT.  Characteristics  of  software  reliability  can  be  stated  in  terms 
of  the  mean  functions  of  various  transition  counting  processes  associated 
with  a  birth-death  process. 

Many  of  these  counting  processes  are  shown  to  share  the  attribute  of 
Poisson  processes  that  the  higher  moment  functions  can  be  expressed  in 
terms  of  the  mean. 

Such  a  dependence  provides  a  computational  basis  for  estimation  of  the 
mean  functions,  hence  of  software  reliability,  by  any  method  (such  as 
generalized  least  squares)  which  uses  only  a  few  moments. 

1.  Introduction.  Software,  unlike  hardware,  can  in  principle  evolve  to  per¬ 
fection  through  the  discovery  and  removal  of  error.  Large  computer  programs 
often  go  through  a  phase  of  testing  for  faults  (errors,  discrepancies  from 
intended  function)  and  concurrent  software  modification  for  removal  of  dis¬ 
covered  faults.  Software  reliability  measurement,  if  viewed  as  an  estimate 
of  the  error  content  of  a  computer  program,  must  account  for  both  discovery 
and  removal  of  errors.  A  birth  and  death  process  is  an  appropriate  model 
for  the  discovery  and  removal  process,  if  the  emphasis  is  on  the  active 
(discovered  but  not  yet  removed)  faults. 

Active  faults  arrive  through  testing  a  program  with  "typical"  input, 
and  depart  through  the  efforts  of  debuggers  to  remove  them.  A  population 
process  can  trace  the  evolution  of  software  reliability  by  modeling  the 
decline  of  the  active  fault  population  toward  eventual  extinction. 

To  be  effective,  a  model  for  the  active  fault  population  must  have  the 
properties  that  it  can  fit  data  observed  of  a  particular  program,  and 
that  useful  conclusions  can  be  drawn  from  the  model  which  are  not  obvious 
from  the  data. 

This  paper  considers  conditions  on  the  active  fault  population  process 
that  allow  both  fitting  the  process  to  data  and  subsequent  computations 
relevant  to  reliability. 

From  data  consisting  of  recorded  times  of  faults  and  times  of  repairs, 
the  procedure  to  be  described  attempts  to  estimate,  among  other  things,  the 
one-dimensional  distributions  of  the  count  of  active  faults. 


*  Sponsored  by  the  United  States  Army  under  Contract  Number  DAEA18-77-C-0134. 
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2.  Notation.  X:flxT — >En  is  a  stochastic  process  defined  on  the 
measure  space  (with  complete  probability  measure  Pr)  and  on  the  parameter 
set  T=[o,®),  with  values  in  euclidean  space  En.  X  has  random  variables 
Xt(*)=X(*,t)  and  sample  functions  Xw( *)=X(w, •) . 

The  letters  i,j,k  represent  integers  or  elements  of  Ln,  the  integer 
lattice  points  of  En. 

For  j£Ln  and  I  C  T  any  interval,  H j ( I )  =  Jw€fi  :  wxlCx~*(j)  |, 

Hj  =  Hjd),  and  H ( I )  =  U  Hj(I). 


R  represents  any  irreflexive  relation  on  Ln,  and 


Aj(w)  =  U 


Ar(w)  =  u 


I  :  weHj  ( I  )j  for  jeLn, 


Ar(w)  =  U^Aj(w-):  j  €  support  R  , 


and 


A|<(w):  ke  range  R  ,  for  w«Hr. 


For  t<T,  Dt  is  the  set  of  w«£2  such  that  the  sample  function  Xw  has  a 
discontinuity  at  t. 


3.  Assumptions.  We  consider  hypotheses  on  X  that  are  suited  both  to  computing, 
from  data,  estimates  of  the  one-dimensional  distributions  of  X,  and  to 
computing  distributions  of  random  variables  associated  with  processes 
that  enumerate  particular  discontinuities  of  the  sample  functions  of  X. 

X  Itself  will  be  a  birth-death  (or  similar)  process,  with  software 
faults  corresponding  to  births  and  software  repairs  (fault  removals)  cor¬ 
responding  to  deaths.  A  fault  will  be  said  to  be  active  if  software  input 
conditions  have  caused  the  fault  to  be  executed  at  least  once  and  it  has 
not  yet  been  removed,  and  acti vated  if  the  fault  is  or  has  been  active. 

The  sample  function  value  Xw(t)  is  the  number  of  active  faults  at 
time  t. 


Among  the  counting  processes  related  to  X  that  will  be  of  interest  are 
the  count  of  activated  faults,  the  count  of  repairs,  and  the  count  of 
active  fault  population  extinctions  (that  is,  the  number  of  times  the  only 
active  fault  is  removed). 


X  is  assumed  to  satisfy: 

Al.  Pr(Dt)  =  o  for  all  tcT;  that  is,  X  has  no  fixed  points  of  discontinuity; 
A2.  ir-j°Xw:T — >e!  is  an  integer  valued  step  function  with  unit  jump 
for  all  w«n,  where  ir-j : En — >E1  is  the  ith  projection,  l<i<n; 
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A3.  X  is  a  Markov  process,  and 

A3.1  there  are  piecewise  right  continuous  functions  qj|<(*)  such  that 

Pjk(t,u)=  (qjk(u)  (u-t)+o(u-t) ,  jVk 
(l-qjj(u)(u-t)+o(u-t) ,  j=k 
where  qjj(u)  =  qjk(u) 

for  j,keLn,  t<u  for  the  Markov  transition  functions 
Pjk(t,u)  -  Pr  |xu=k|Xt=jJ  ; 

A3. 2  P(j,t)-Pr  |xt=j j  is  left  continuous  for  all  jeLn;  and 
A4.  Pr(E(t,u) )  =  o(u-t),  where  E(t,u)  is  the  set  of  sample  functions  with  ' 
at  least  two  discontinuities  on  (t,u). 

We  will  call  X  refinable  if  X  satisfies  A1  through  A4. 

4.  Properties  of  a  refinable  process.  To  select  particular  discontinuities 
of  X,  we  define  the  following  processes.  For  any  irreflexive  relation 
R  on  Ln,  and  s>_o,  define  Ns:fixT— >E1  by  Ns(w,t)  =  the  number  of  discon¬ 
tinuities  ue(s ,t)  of  the  sample  function  Xw(*)  such  that  (Xw(u~) ,Xw(u+)  )£R, 
if  t>s;  and  Ns(w,t)  =  o  if  t<s. 

By  Al,  Ns ( •  ,t)  =  Ns( • ,u)+Nu(  •  ,t)  almost  surely  for  s<u<t. 

For  future  reference  (and  to  call  Ns  a  counting  process)  we  note 

Lemma  1.  For  y,zeT,  y<z,  Ny(*,z):fi — >E*  is  a  random  variable 

measurable  on  the  sample  space  of  |x^:te(y,z)|  (the  a-algebra  generated  by 

X^(B)  for  te(y,z),  Borel  sets  B  C  En). 

Proof:  For  r,seT  and  j,keLn  define 

Ars(j»k)=  n,  ,Hj[r ,q)U  Hk(q,s] 
qc(r,s) 

where  the  intersection  is  over  rational  qe(r,s)  and  where  Hj[r,q)  3  Hj(I) 
for  I=[r ,q) . 

For  n>o  let  Qn  be  any  ordered  sequence  of  rationals  ri<si<...<  rp<Sn, 
and  for  irreflexive  relation  R  defining  let 

A(Q">R)-  A  (j,io«RAr's'(1,k)- 
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Finally,  let  mJ  (y,z)  =  u,  /(Qn.R)- 

QnC(y.z) 

Then  Mp  (y,z)  differs  from  the  set  jwefi:Ny(w,z)  >_  n 
(union  over  rational  q)  of  measure  o.  QED. 

Of  course,  the  process  Ns  does  not  contain  more  information  than  X, 
however  the  following  definition  is  useful. 

Definition.  For  fixed  s>_o  and  irreflexive  relation  R  on  Ln,  the  ref i nement  of 
X:fixT — >En  (satisfying  A1-A4)  is  XrfixT — >En+^  where  X(w,t)  - 
(X(w,t),  Ns(w,t) ) , 

The  following  theorem  justifies  calling  X  a  refinement  of  X. 

Theorem  1.  If  X  is  a  refinable  process,  then  X  also  is.  The  transition 
intensities  of  X  are  inherited  by  X,  in  the  sense  of  Corollary  1.1. 

Proof:  The  discontinuities  of  Xw  coincide  with  those  of  Xw,  so  X  satisfies 
Al,  A2  and  A4.  To  show  A3,  interpret  "equal"  as  "equal  almost  surely",  and 
let  YUjk  be  the  characteristic  function  of  the  set  Xy^k)  and  let  Zy>n  be 
the  characteristic  function  of  the  set  jw:Ny(w)=n  }  . 

For  $<t<u ,  n5=n|+nJ,  so 

zu,n  =  i?0  ^t,i  zu,n-i*  YUjk  Zy>n  equal  to  the 

characteristic  function  of  the  set  X^(k,n).  Now  zfj  is  measurable  on  the 
sample  space  of  Nf,  and  the  sample  space  of  |xr:r<tj  is  equal  to  the  sample 
space  of  {xr:r<t}  .  Therefore,  EC Yu # ^  Zu>n|Xr:r<t]  = 

.jj!0  ^t,i  EEYu,k  zu,n-i  I Xr: r<t] . 

Yu  k  Zy  n_-j  is  measurable  on  the  sample  space  of  (xr:r>tj  ,  so  by  the  Markov 
property  of  X, 

EEYu,k  zu,nlxr:rlt]  =  zt,i  ELYu,k  zu,n-ilxt^* 

The  last  sum  is  measurable  on  the  sample  space  of  Xt  and  so 

E[Yu,k  zu,nlxr^rit]  =_JCYu,k  ziP,n  xtL  ^ 
that  is,  Pr  Jxu=(k ,n)  | Xr:Ktj  =  Pr  Xu=  ( k ,n)  | X-t} 

For  t<s<u,  E[Yu>k  Zy>n|X^:r<t]  *  E[YUjk  Zy>n|Xr:r<t]  as  above. 

By  the  Markov  property  of  X,  the  last  expectation  is  E[YUjk  ZUjn|Xt]  as 
before,  which  is  measurable  on  the  sample  space  of  Xt,  and  the  result  follows. 


by  the  set  u  Da 
qeT  M 
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If  t<u<s,  Yu>k  Zy>n  =  o  if  n>l,  and 
Zu.ol Xr:rit]  =  E[Yu>k|Xr:r<t] 

=  E[YU)k|Xt]  =  E[Yu>k  Zy>0|Xt]  >  so  X  is  a  Markov  process. 

To  show  A3. 2,  let  At  =  Xt^j).  Bt  =  Xt^j.n)  and  P(j,n,t)  =  Pr|Xt=(j ,n)| 
for  arbitrary  tcT,  and  jeLk  and  n  an  integer,  both  being  fixed 
but  arbitrary.  .  <■ 

By  A3. 2  Pr(At\Au)=qjj(u) (u-t)P(j ,t)+o(u-t)  for  t<u,  so  Tim  Pr(At\Au)=o. 
Since  P(j,t)  is  left  continuous, 

osl im  |P(j,t)-P(j,u)  | 

t-*u- 

=lim  |Pr(At\Au)-Pr(Au\At) |  and  lim  Pr(Au\A^)-o. 

t"+U-  t**U- 

Now,  Bt\Bu  c  At\Au  u  Ct)U  where  Ct>u  =  |w:Xt=j ,Xu=j .Nf-n.Nj^n)  and 

Pr(Ct  „)-  fo(u-t),  s<u 
jo  ,  t<u<s 

by  Al,  A4  and  definition  of  Ns.  Therefore,  lim  Pr(Bt\Bu)=o  and 

similarly  for  Bu\Bt.  so  lim  P(j,n,t)  =  P(j,n,u)  and  Tsatisfies  A3. 2.  For 

t-»u- 


the  proof  that  X  satisfies  A3.1,  see  Corollary  1.1.  QED. 

Corollary  1.1  Suppose  X  is  refinable,  and  let  Pjk(u ,t)=Pr{ Xt=(k ,n) | Xu=( j ,m)|  . 
Then, 


1.  If  s<u<t  then  Pjk(u,t)=/Pjk(u,t)+o(t-u) ,  n*m+l  and  (j,k)eR  or 

<  n=m  and  ( j  ,k)<R, 

vo(t-u)  otherwise. 


2.  If  u<t<s  then  Pjk(u,t)=(Pjk(u,t) ,  n*m=o, 

(o  otherwise. 


Proof:  Let  A=  X^(j,m),  and  define  Pjk(u,t)  as  required  if  Pr(A)=o. 
If  s<u<t, 

Pr  {xL={k,n),xL=(j.m)j  =  zf,ndPr 


Xr:r<u]dPr= 


A(n-m) 

o 


,  o <m <n , 
otherwi  se, 


where 
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A( i )—  /  ECY^^k  Ztti  | Xr :<u]dPr,  by  the  proof  of  Theorem  1.  Since  s_<u, 

JA 

A(i)=  f  Zt>idPr=o(t-u)  if  i>2  by  A4. 

JA 

Now,  /  E[Y-t>|c|Xr:Ku]dPr=A(o)+A(l)+o(t-u) ,  again  by  A4,  and  A(l)=o(t-u)  if 

JA 

(j,k)^R  and  A(o)=o(t-u)  if  (j,k)  R,  by  A4  and  definition  of  Ns. 

Therefore,  since  ECY-t ,kl^r:r£.u]=Pr  jx-t=k|Xu=j  a.e.  on  A  , 

Pr  {xt=(k,n),X=(j,m)J 

/Pr  |x^=k|Xu=j|  Pr  |xu=(j,m)j  +o(t-u),  if  n-m=l  and  (j,k)eR  or 


o(t-u) 


n-m®o  and  (j,k)^R  , 
otherwise. 


and  the  conclusion  follows. 

If  u<t<s,  Pr  |x-t=(k ,n)  ,XU=( j  ,m)j  =o,  if  m>o  or  n>o.  For  n=o,  the  result  is 
immediate.  QED. 

The  following  definition  is  useful  for  application  of  the  theorem. 

Definition.  The  irreflexive  relation  R  is  an  attachment  of  the  process  X  if 
for  almost  all  w&Hr  there  is  (j,k)£R  such  that 
sup  Aj(w)=sup  Ar(w)=i' nf  AR(w)=inf  A|<(w). 

Lemma  2.  If  R  is  an  attachment  of  the  refinable  process  X,  then  for  almost 
all  wbHr  there  is  a  unique  discontinuity  t  of  Xw  such  that  (Xw(t~) ,Xw(t+) )£R. 

Proof:  t=sup  AR(w)=inf  ^r(w) .  QED. 

If  R  is  an  attachment  of  X,  define  HR(t,u)=  U  X^(j)nxJ]^(k) 

( j  ,k)£R 

for  t<u,  and  define  ^r : Hr — >T  by  *7R(w)=inf  Ar(w).  Then 

Lemma  3.  If  R  is  an  attachment,  >?R£(t,u)  for  almost  all  weHR(t,u). 

Proof:  HR(t,u)  C  HRtjDtUDu,  and  t<sup  Ar(w)  and  u>inf  Ar(w)  for  weDtpDg. 

QED. 

^R  induces  a  real  Bore!  measure  ur  defined  by  uR(E)=Pr(^R1(E))  for 
Borel  sets  EGT. 

The  cumulative  distribution  function  of  pr,  Fr,  defined  by  FR(t)=pR([o,t) ) , 
satisfies  FR(u)~FR(t)=pR(t,u)  for  t<u  by  Al.  By  A4,  Pr(HR(t,u) )=FR(u)-FR(t)+o(u-t) 
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and  therefore. 


Proposition  1.  If  R  is  a  finite  attachment,  then  Fr(u)= 
all  but  finitely  many  u>o. 


(JAKERqjk(u,P(j’u)  f°r 


Proof.  Fr(u)  exists  if  and  only  if  the  two  sided  limits  lim  Pr(HR(t,u))  and 

Uu-  (pr - 


lim  Pr(HR(u,t))  exist  and  are  equal. 

t**u+  - t=U - 


Since  Pr(HR(t,u)  )  = 


(j>k)lRPr  {Xu=k|X«}  Pr'M 


=  E  qj|<(u)P(u,j)+o(u-t) ,  the  conclusion  follows  by  A3.1  and  A3. 2.  QED. 

If  R  is  a  relation  on  Lm,  define  Rn  as  the  relation  ujn»kn+l)*(j»k)€R}  on  Lm+i 
where  jn  is  the  element  of  Lm+i  with  m+1  st  coordinate  n  and  which  coincides 
with  j  on  Lm.  Then 

Lemma  4.  If  X : fixT — >Em  is  refinable,  then  for  fixed  s>o  and  irreflexive 
relation  R  on  Lm,  Rn  is  an  attachment  of  X  for  every  n>o. 

Proof:  Let  S=(UyDq  where  q  is  rational,  and  choose  w«Hr\,S  and  rational 


r,t  with  reARn(w)  and  t«ARn(w). 


Then  r<t  since  Ns(w,t)>Ns(w,r) .  Arbitrary  choice  of  r,t  then  shows 
sup  Ar0(w)  <_  inf  ARn(w). 

Since  w^DrUDt,  there  is  ue(r,t)  with  (Xw(u-),  Xw(u+))£R.  By  selection 
of  r  and  t,  u  necessarily  satisfies  (Xw(u-),  Xw(u+))fRn,  and 

u<sup  ARn(w)<.inf  ARn(w)j<u.  QED. 

Simplifying  the  notations  for  HRn  and  >?Rn  to  Hn  and  )7n,  define  pn:H n — >T  as 
done  following  Lemma  2,  with  induced  measure  un  and  distribution 
functions  Fn.  Note  that  the  sets  jw:Ns(w,t)>ji+l|  and  ^(Co.t))  are  equal 
almost  surely  so  that 
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Lemnia_5.  E[Nf>  iPrAlfenl-  E  ynCs,t)»  E  Fn(t). 

n>l  t  J  n>o  n>  o 

Proposition  2.  For  s>o  and  finite  irreflexive  relation  R  on  the  state  space 
of  the  refinable  process  X,  if  qjk  is  a  right  continuous  real  function  for 
(j»k)eR,  then 

s 

E[Nt>  /  .E  qjk(u)P(j,u)du  for  t>s. 

Js  (j  >k)eR  J 

Prwf:  Let  P(j,n,t)=Pr  |xt=jnl  .  Then  by  Proposition  1,  Lemma  4,  and  Theorem  1, 
Fn(u)=  ^  k^Rqjk(u)P(j  ,n,u)  for  u>s. 

Fn(s)=0  For  n2.1  and  the  right  hand  derivative  of  F0  at  s  is  the  upper 
derivative  Dp0(s)  of  p0  at  s  with  respect  to  open  segments,  with 
Dy0(s)=  k^Rqjk(s)p(j»°»s)  . 

Since  Dyn  is  finite  everywhere  for  n>_o,  pn  is  absolutely  continuous  with 
respect  to  Lebesgue  measure  for  n>o  and 

Fn^)“  ^t(J.jk)ERqjk(u)p(J»n»u)du. 

Then  /*t 

E^Nt3=  n^Fn(t)=y  ^  k^Rqjk(u)p(j,u)du  by  Lemma  5.  QED. 

For  the  next  lemma,  the  following  notation  will  be  used.  If  is 
any  collection  of  relations  on  the  state  space  of  X  let»7nj  =7?Ri  (n)  :Hpi  (n) — 

for  the  process  Xj  defined  via  R-j ,  n  and  s  as  done  previously,  following 

Proposition  1.  Use  to  induce  Bore!  measures  yn  -j  and  pj  =  e  pn  .■  , 

*  n>  o  ’ 

Fn,i(t)  an(^  Fi(t)  be  the  distribution  functions  for  pn  ^  and  p-j 
respectively,  and  let  n|j  be  the  counting  process  associated  with  )Tj. 

Then 


Lemmaj6.  If  Ri  C  R2  C  ...  is  an  ascending  chain  of  relations  whose  union 
is  R,  then  lim  E[Nfsl*]-  E[N|]  where  is  the  process  defined  via  R  and  s. 


Proof:  LetMni(y,z)=  u  flfn  R, 

Qn  C  (y,z)A(Qn’Ri) 
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as  defined  for  Lemma  1,  and  check  that 


■1.  Mn>i(y,z)  C  Mnj(y,z)  if  Ri  C  Rj  for  n>l,  and 

2-  Mn(y,z)=  U  Mn  i(y,z)  where  Mn(y,z)  =  U  /(Qn,R)* 

"n  iy»zJ 

Let  B1>n=Mn>i(s,t)  and 

Bi  ,n=Mn,i(s>t)-Mnj_i(s,t)  for 

and  note  that  \  Pr(Bi  ,n)=Pr(Mn  k(s,t))  =  Pn-l.kCs.t). 
l-l  *  * 

Then  E[Nf>  E  un£s,t)=  E  E  Pr(B-j  n+i)=lim  E  yn  uCs.t) 
n>o  n>o  i>l  *  k->®  n>o  ’ 

*1 im  Fk(t)=lim  E[N?  k]  .  QED. 
k-*»  k-*=  * 

Corollary  2.1  If  R  is  any  irreflexive  relation  on  the  state  space  of  X  and 
qjk  is  a  right  continuous  real  valued  function  for  ( j  ,k) €  R,  then 

E[n|]  =/*  E  qjk(u)P(j,u)du  for  t>S  . 
y  s  U  »k)  R 

Proof:  Assume  R  infinite  and  let  RiC  R?C . . .  be  an  ascending  chain  of 
finite  relations  whose  union  is  R. 

Then  Proposition  2  gives 
ECN|>i]=/t  gi(u)du 

where  gi(u)«  E  qjk(u)P(j,u)  . 

K-j 

Now,,  |gi(u)|  is  a  monotone  sequence  with 

9i(u)  — k)|RqJk(u)p^'>u^  and  50 

Jt  gj(u)du — ■> /*  .S  qjk(u)P(j ,u)du  by  monotone 
s  s  (j,k)€R 

r  t  c 

convergence.  But  /  g1-(u)du  — ^>E[N^]  by  Lemma  6.  OED. 
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Corollary  2.2  Under  the  hypotheses  of  Corollary  2.1,  the  distribution 
function  for  r?n  satisfies 

tnM-f  ^  k)|Rqjk(u)P(j.n,u)du. 

Proof:  The  proof  of  Lemma  6  includes  the  fact  that 

UnCs.tHim  pn  kCs»t)*l  1m  Fn  k(t) 

The  argument  of  corollary  2.1  shows  this  limit  to  be 

/s  U,k)eERqJk(u)P(J>n-u,du- 

Generating  equations.  We  will  now  use  the  previous  results  to  develop 
expressions  for  higher  moments  of  the  random  variables  N|  defined  for 

particular  s>p  and  irreflexive  relation  R  on  L|<,  for  a  refinable  process 
X:nxT— >Ek. 

If  let  j(r)  be  the  rth  integer  coordinate  of  j  and  let 

A( j)=|icLk :max  j | i (r)-j(r) | :l<r<kj=ll  be  the  lattice  points  adjacent  to  j. 

By  A2,  A3.1,  and  A4,  ^Pj|<(u,t)=o(t-u)  for  t>u. 

Corollary  1.1  then  gives  the  following  cases. 

If  s<u<t, 

(1)  P(j.n,t)=Pjj(u,t)P(j,n,u)+  z  Pij(u,t)P(i,n,u)+ 

(  *1  9  J 

c  •\i'rjPij^,f^P^I  »n-l,u)+o(t-u),  n>o, 
and 

P(j,o,t)=Pjj(u,t)P(j,o,u)+  5  Pij(u,t)P(i,o,u)+o(t-u). 

( 1 >J )£R 

If  u<t<s, 

(2)  P(j,n,t)=o  ,  n>o  and 

P(j,o,t)=P-jj(u,t)P(j,o,u)+  Z  Pij(u,t)P(i,o,u)+o(t-u). 

t£A( j ) 

All  of  the  above  sums  can  be  assumed  to  be  finite. 

Equations  (1)  and  (2)  give  the  following  derivatives,  using  +  and  -  to 
denote  right-  and  left-hand  derivatives,  respectively,  where  q-jj  and 
P(j »n» *)  satisfy  appropriate  continuity  conditions. 

(3)  If  t>s. 
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P '  ( J  »n,t)=-qjj(t)P(j  ,ntt)+  j}^i  j (t)p(i  ,n’t)+(i  ,j)|Rqi J (t)P(i  ’n"1  ’t}  * 

(4)  If  t=S , 

P'(j,n,s)=o  ,  n>l , 

P'(j,l,s)+=  (.  j.)|Rqij(s)P(1,0>s), 

P'(j,l,s)_=o  , 

P'(j,o,s)+=  -qjj(s)P(j,o,s)+  J  qij(s)P(1,o,s), 

P'(j,o,s)_=  -qjj(s)P(j,o,s)+.|A(j)qij(s)P(i,o,s); 

(5)  If  t<s , 

P 1  ( j ,n,t)=o  ,  n>o, 

P'(j,o,t)=  -qjj(t)P(j,o,t)+.|  qij(t)P(1,o,t). 

For  jfLk,  define  G(j,x,t)=  z  xnP(j,n,t),  |x|<l,  so  that 

n>o 

G(x,t)=  z  G(j,x,t)  is  the  probability  generating  function  for  the 
jet-k 

random  variable  N^.  Then 

Proposition  3.  If  q j ( t )  are  bounded  functions  for  i,jfLk»  then 

(6)  ^G(j»x>t)=  j(t)G(j  ,x,t)+^  j)^Rqij(t)G(i’x’t) 


+x*  Z  qij(t)G(i,x,t)  ,  for  |x|_<l,  t>s. 

Proof:  For  fixed  x<l,  boundedness  of  the  qn-j(t)  implies  uniform  convergence 
of  nf/nj’"’4’ 

-  -qjj(t)P(j,o,t)+(1>j)|Rqij(t)P(1,o,t) 

+n>l  x"[-<’iJ(t)p(J>n't)+(1ij)|Rq1j(t)p(1-n-t) 


n>o  , 
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+  Z  qij(t)P(i,n-l,t)],  from  (3)  and  (4). 

The  left  side  of  the  equation  is  therefore  A.  G(j,x,t). 

Absolute  convergence  is  also  guaranteed  by  hypothesis,  and  the  right  side 
can  be  rearranged  to  give  (6).  For  x=l,  (6)  is  an  application  of  A3.1,  A3. 2, 
and  A4.  QED. 

Equations  (6)  will  be  called  the  generating  equations. 

Now 

(7)  G(x,t)=  E  xnPr  lN?=nl  and  for  t>s, 

n>o  i  )  ~ 

(8)  I  xn  d_PriN|=nU-Fi(t)+  E  xn(F^!(t)^(t)) 

n>o  dt  1  *  n>l 

=  I  xn  [  z  z  qi1(t)P(i,n-l,t) 

n>o  J€Lk(i,j)£R  1J 

+  Z  qij(t)P(i  ,n,t)-qjj(t)P(j,n,t)], 
v  •  *  J 

applying  (3)  and  (4)  and  interpreting  derivatives  as  right  hand  derivatives 
at  s. 

The  proof  of  Proposition  2  shows  that  |Fn_i(t)-Fp(t)  |<2  E[N|], 

so  if  the  latter  quantity  is  a  bounded  function,  the  result  of  term  by 
term  differentiation  of  (7)  with  respect  to  t  is  uniformly  and  absolutely 
convergent  for  any  | x | <1 .  If  the  functions  q-jj(t)  are  also  bounded,  then 
application  of  (6)  and  rearrangement  of  (8)  show  that 

^G(x,t)~  ^C(j,x,t),  | x ]  <1 . 

Finally,  we  can  apply  a  power  series  argument  to  (8)  and  conclude 


Lemma  7.  If  z  qj j(t)P(j ,t)  is  a  bounded  function, 

..  jeLk 

then 

i+1  i+1 

^  7S 

(9)  - G(x,t)=  E  - G(j,x,t)  for  |x|<l,i>o. 

3x^t  ^  k  3X^t 
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Proof:  The  hypothesis  guarantees  the  preceding  argument,  since  the  conclu 
sion  of  Proposition  3  is  true  under  this  condition  also.  QED. 

We  are  finally  in  a  position  to  formally  develop  moments  of  the  random 
variables  n|. 

Differentiating  the  generating  equations  (6)  with  respect  to  x  and  sum 
ming  over  produces,  applying  (9), 


3  ^(’!-t,=jlLkC'qjj(t)'^G(d’X’t)MA(j)  GO'.x.t)] 


Zx  Zt 


+(x'1)  Ak(u)lR,,j(t)a*G(1'x’t) 

+  jli*(i.J)«V,j{t)s(1,x,t) 

The  first  sum  over  is  zero  by  A3.1.  Integrating,  we  get 


j|G(x,t)=(x-l)^  (u)|Rqij(u)  ^G(1,x,u)du 
/s  (i.J)eRqij(u)G(1'x*“)du- 

An  application  of  dominated  convergence  then  gives 

liSl.^G(x,t)=/st(1>j)|Rq,j(u)PC,,U)du, 

which  is  a  restatement  of  Corollary  2.1. 

We  can  apply  the  same  procedure  to  find 

2  t  2 

tr  ®(x.t)-(*-i)/t  G(i.xtU)dU 

+2/"v.  ..Eqii(u)  a  6(i  ,x,u)du  ,  so 

Js  (i.J)fiR  J  dx 


2  Cx 

(10)  TMm_  J^G(x,t  )=2J  j}|Rqi  j(u)Qs(i  ,u)du 

dx  ^ 

where  by  Proposition  3.,  Qs(j ,t)=l im  _d_G(j,x,t)  for  jeL|<  satisfies 

x-»l-  dx 
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(11)  Qs(j  »t) "Pj  j (t)Qs( J  ^<li  j  (t)Qs(i  »t) 

+(1,3)cV1j(t,P(1,t’ 


with  initial  conditions  Qs(o,s)-o,  for  j6Lk,  th. 

We  therefore  have 

Proposition  4.  If  z  q-ii(t)P(j,t)  is  a  bounded  function 

Mk  JJ 

for  t>s,  then  Cov(N°,  N°)=E[N°]-E[Ns]E[Nt]  +  K(o,s)+K(o,t)-K(s,t) 
where  K(x,y)=/  z  qi j (u)Qx(i ,u)du 

and  where  Qx(j,t)  satisfies  (11)  (for  s=x). 

Proof:  for  o  <_s  <t , 

Var[Nf]  =  Vyri  C^^(x>t)  +-^G(x»t)-(^-<a(x,t)  )^] 

by  properties  of  probability  generating  functions.  Also,  since  N£=Ns+Nf 
almost  surely,  Cov(N?,Nf)=(Var[Nf]+Var[M?]-Var[N|])/2, 
and  the  result  follows  from  (10).  QED. 


Application.  If  X  is  a  birth-death  process  whose  sample  functions  give  the 
number  of  active  faults  in  software  as  functions  of  time,  then  the  one  dimen¬ 
sional  discrete  distributions  of  X  can  be  computed  by  solving  the  system  of 
equations  (6)  for  x=l.  The  system  is 


(12) 


P'(j,t)=  -qjj(t)P(j,t)+ 


.Z 

ie 


A(j)qO(t)P(,>t) 


,J>°» 


with  initial  conditions  P(i,o)=o,  i >1 ,  and  P(o,o)=l,  where  the  states  of  X 
are  the  non-negative  integers. 


The  solution  of  (12)  requires  that  the  intensity  functions  q-ji(t)  be 
specified.  The  results  above  can  be  used  to  generate  estimates  of  the  dij(t) 
and  ultimately  of  the  distributions  /p(i,t),  t>o,  i>oi  under  hypotheses  on 
the  q-jj(t).  *  ' 

For  example,  assume  that 

(13)  qij(t)=X(t)  for  all  pairs  (i,j)  with  i_>o  and  i=j-l,  so  that  the 
intensity  of  transitions  corresponding  to  faults  activations  is  a  function 
of  time  but  not  of  state.  Then 


(14) 


z  x(u)P(i,u)du 
i>o 
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•/ 


x(u)du,  where  N°  is  the  random  variable  for  the  count  of 


activated  faults,  by  Corollary  2.1. 

Denoting  E[N°]  as  a  function  of  time  by  F(t),  we  have 
(15)  A  ( t )  =  F 1  ( t ) . 

Therefore,  if  the  mean  function  of  the  activated  fault  counting  process, 
F(t),  were  known,  we  would  know  the  intensity  functions  for  fault  activation, 

Similarly,  if  R(t)  is  the  mean  function  for  the  repair  counting  process, 
then  by  Corollary  2.1, 


(16)  R(t)  =fl  z  m(u)P(i,u)du 

J  o 


where  y^(t)  is  the  intensity  function  for  transitions  from  a  state  of  i 
active  faults  to  a  state  of  i-1  active  faults,  for  i>_l. 

If  we  assume  that  there  is  a  finite  number,  K,  of  debuggers  making 
repairs  and  that  therefore  the  intensities  v-j(t)  satisfy 

(17)  y-j  (t)  *  jiy(t)  ,  i<K 
(Ky(t)  ,  i>K 


for  some  function  of  time  y(t),  then  (12)  becomes 

(18)  R(t)  =[  y(u)[K+.Z  (i-K)P(i  ,u)]du 

and  Jo  1=0 

(1?)  y(t)  =  R'(t)/(K+  z  (i-K)P(i ,t) ). 

1=0 


Assuming  (13)  and  (17),  we  could  solve  the  system  (12)  for  the  distri¬ 
butions  of  the  active  fault  count  if  only  we  knew  the  mean  functions  F ( t ) 
and  R(t)  for  the  counting  processes  of  activated  faults  and  of  repairs. 

For  purposes  of  estimation,  one  may  as  well  assume,  given  (13),  that 
F(t)  is  the  mean  function  of  a  Poisson  process.  An  estimate  of  F(t)  could 
then  be  determined  by  well  known  methods.  We  therefore  will  consider  a 
method  of  estimating  R(t). 

The  counting  process  for  software  repairs  will  almost  certainly  have 
correlated  increments,  especially  if  K  is  finite  in  (17),  and  a  likelihood 
function  for  estimation  of  R(t)  will  be  difficult  if  not  impossible  to  give 
expression  in  closed  form.  A  next  best  procedure  for  estimating  R(t)  is 
to  take  advantage  of  the  first  two  moments  of  the  repair  counting  process, 
and  to  match  moments  with  a  Normal  process. 

To  estimate  R(t)  we  then  want  to  minimize  an  expression  of  the  form 
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(20)  i  (^-RftiDVijfCj-Rftj)) 

where  C-j  is  a  count  of  repairs  made  prior  to  tj ,  i=l,...,M,  and  where 

V-jj  is  an  entry  of  the  covariance  matrix  [Cov(Nt_j,  N?,-)] 
of  the  repair  counting  process.  J 

Proposition  4  suggests  an  iterative  scheme  for  the  estimation  of  R(t) 
by  minimizing  (20).  We  assume  that  an  estimate  of  R(t)  can  be  made  by 
estimating  a  vector  of  parameters  (ai,c*2,  •  •  •  »an)=ct  upon  which  R(t) 
depends.  For  practical  purposes,  the  scheme  then  takes  the  following  form. 

Given  an  initial  estimate  and  assuming  an 

estimate  of  F(t),  apply  (13),  (15),  (17),  and  (19)  to  compute  the  solution  of 
(12).  Next,  apply  Corollary  1.1,  computing  solutions  of  (11),  and  apply 
Proposition  4  to  compute  an  estimate  of  V,  the  covariance  matrix  of  the 
repair  counting  process. 

Minimize  (20)  for  this  V  for  a  new  estimate  (a1^)ai^)...a^)!5of^ »  anc* 

id  I  1  c  n  _  i+n 

finally,  generate  a  sequence  <a  of  parameter  vectors,  with  a  ' 

[\  y  ( 9 )  '  '  ( i ) 

generated  from  a'  as  a'  ’  was  generated  from  o'  . 

If  the  sequence  is  convergent  then  we  will  have  succeeded  in 

estimating  R(t)  by  matching  the  first  two  moments  of  the  repair  counting 
process  with  a  Normal  process.  Note:  the  sequence  jcT^|  may  converge 
slowly.  In  practice  therefore,  the  sequence  is  subjected  to  a  Steffensen 
acceleration.  For  every  n+2  elements  of  the  sequence  an  approximate 

Jacobian  matrix  is  constructed  using  finite  differences,  etc.,  and  an  element 
■p"  of  the  accelerated  sequence  of  parameter  vectors  is  generated  which 
becomes  the  starting  vector  of  a  new  sequence.  In  addition,  to 

save  computer  time,  elements  of  the  sequence  are  not  required 

to  actually  minimize  (20)  for  various  V  but  only  to  be  terminal  elements  of 
a  truncated  sequence  of  iterates  in  an  attempt  to  minimize  (20)  by  a  Newton- 
Raphson  method.  Finally,  it  is  some  f  that,  at  convergence,  is  tested 
as  a  minimum  of  (20). 

Given  estimates  of  the  mean  functions  F(t)  and  R(t),  the  intensities  q-jj(t) 
in  (12)  can  in  principle  be  computed.  By  Theorem  1.,  we  may  then  refine 
the  process  X  by  various  transition  counting  processes  and  compute  various 
quantities  pertinent  to  software  reliability.  For  the  following  applica¬ 
tions,  we  assume  X  to  be  a  refinable  process,  and  that  some  method  such  as 
outlined  above  has  been  used  to  generate  estimates  of  F(t),  R(t),  and  the  q-jj(t). 
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Distribution  of  Time  to  Next  Repair 


For  s>o,  we  may  consider  the  process  jNt,  t>oj  counting  the  number  of 
•repair  transitions  subsequent  to  time  s.  We  count  repairs  via  the  relation 
j(i,i-l),  i and  define  the  process  |Nf,t>oj  accordingly. 

Corollary  2.2  then  states  that 

ft 

(21)  Fn(t)  =f  l  ui(u)P(i  ,n,u)du, 

J  s  i>l 

where  Fn(t)  is  the  distribution  function  for  time  to  the  n+1  st  repair 
occurring  after  time  s,  where  the  intensity  functions  u-j(t)  are  precisely 
the  same  as  in  (16)  for  t>s,  and  where  the  P(i,n,t)  satisfy  the  system  of 
equations  under  (3),  (4)  and  (5)  above,  with  the  intensities  in  (12).  In 
particular,  the  P(i,n,t)  satisfy  the  system 


(22)  P,(i,n,t)=yi+1(t)P(i+l,n-l,t)+xi_1(t)P(i-l,n,t) 

- ( p i  ( t ) (t ) ) P ( i , n , t ) ,  n>o,i>o, 

P' (o,n,t)=vi(t)P(l,n-l,t)-X0(t)P(o,n,t) ,  n>o, 

P'  (i,o,t)=Xi_i(t)P(i-l,o,t)-(yi(t)+Xi(t))P(i  ,o,t),  i>o,  and 
P'(o,o,t)=-X0(t)P(o,o,t),  for  t>_s, 
with  initial  conditions 

(23)  P(i,n,s)=o,  n>o,  i>^o, 

P(i ,o,s)=P(i  ,s) ,  i>o. 

In  case  n=o  in  (21)  we  need  only  solve  equations  of  (22)  for  n=o,  that 
is,  independently  of  solutions  of  (22)  for  n>o. 

Distributions  of  Time  to  Last  Repair  and  Time  to  Last  Fault 

The  probability  P(s),  that  there  are  no  repairs  in  (s,«)  is,  from  (21) 
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with  n=o 


(24)  P(s)  =  1  -J  Pi(u)Qs(i,u)du, 

where  Qs(i,t)  -  P(i,o,t)  of  the  system  (22)  with  Q$(i,s)=P(i,s), 


We  can  re-write  (24)  as 


(25)  P(s)  =  Tim  z  Qs(i,t),  and  therefore  the  conditional 

i>o 

distribution  of  time  to  last  repair  as 


(26)  D(t)  = 


1  -  P(o) 


In  the  case  K=®  in  (17)  and  assuming  (13)  so  the  increments  of  the 
repair  counting  process  are  Poisson  distributed. 


(27)  Qs(i,t)  = 


,ieR(s)-F(t) 


and  D(t)  becomes 

(28)  0 (t )  =  eR(t)-R(-)  . 

l-e-R(-) 

Similar  arguments  show  that  the  time-to-1 ast-f aul t  distribution  is  given 
by  (28),  with  F  in  place  of  R,  assuming  (13). 

We  may  similarly  show  that  the  conditional  distribution  of  time  to  nth 
from  last  fault  is 


(29)  D 


Jr  f(<°) 

'  uV"< 

y*  F  (») 

une”udu 

o 


Distribution  of  Time  to  Next  Extinction 


The  distribution  of  time  to  n+1  st  transition  after  time  s  from  one  state 
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of  the  active  fault  process  to  an  adjacent  state  is,  by  Corollary  2.2, 

(30)  Fn(t)  =J  qij (u)P(i  ,n,u)du. 

The  distribution  of  time  to  next  extinction  of  the  active  fault  population 
is  of  particular  interest  in  relation  to  software  reliability.  By  (30), 
this  distribution  is 

(31)  F0(t)  ~J  ui(u)P(l,o,u)du, 

where,  by  (3)  ,s (4)  and  (5),  P(i,o,t)  satisfies 

(32)  P' (i  ,o,t)=Pi+i(t)P(i+l ,o,t)+Xi_i(t)P(i-l ,o,t) 

-(wi(t)+Xi(t))P(i,o,t),  i >o, 

P'(o,o,t)*-x0(t)P(o,o,t), 
with  P(i  ,o,s)=P(i  ,s)  for  i>o. 

The  functions  yj(t),  i>l  and  Xi(t),  i>o  are  exactly  the  same  as  for  equation 
(12),  for  t>s. 

Covariance  Function  of  the  active  Fault  Count 

Define  relations  A=  j(i,i+l),  i>oj  ,  B=|(i,i-1),  i>lj ,  C=AUB,  and  let 
Qs(i,t)  be  the  solutions  of  (11)  for  R=A,  Qs(i,t)  the  solutions  of 

(11)  for  R=B,  and  similarly  define  Qs(i,t)  for  R=C,  for  1>o,  oKs<t. 

Let  K(x,y)^/*y  .  £ '  q-j j(u)Qx(i  ,u)du, 

J  x  \  i  *  J  /  ™ 

K(x,y )=(  ..  .  S  qij(u)Qx(i,u)du,  and 

Jx 

define  K(x,y)  analogously. 

Let  J(x,y)=2K(x,y)+2i<(x,y)-£(x,y) ,  and  set  Dx(i ,y)=2Qx(i ,y)-§x(i ,y) 
for  i>o,  u>x.  Observe  that  Dx(i ,y)=Ox(i ,y)-2Qx(i ,y) ,  applying  (11). 

Then  J(x,y )  =  /*  .s  (xi(u)-ui(u))Dx(1,u)du 

Jx  1>0 
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where  X-j ( t )=qn* j (t)  for  (i,j)£A  and  yi  ( t ) =qi j (t)  for  (i,j)£B  and  where  y0(t)=o. 


Next,  write  J(o,x)+J(o,y)-J(x,y) 


ry 

=  2J(o,x)+f  E  (X-j  (u)-ui  (u) )  (D0(i  ,u)-Dx(i  ,u)  )du,  and  note  that 

Jx  i>o 

D0 ( 1  ,t)=iP(i  ,t)  ,  Oo,  i>o  by  considering  D0(i  ,t)/P(i  ,t)  as  a  conditional 

expectation  or  by  application  of  (1*1),  and  that  Ds(i,t)  -  D0(i ,t)-Ds(i ,t) 
satisfies  (12)  for  i>o,  Os  with  initial  conditions  Ds(i  ,s)=iP(i  ,s) ,  i>o. 

Therefore,  J (o,x)+J  (o.y)-J  (x ,y)=2J  (o,x)+  ^  1Dx(i  ,y)'i^i2P(i  ,x) . 

Associating  counting  processes  jNt,  t>o|,  |Nt,  t>_o^  and 
|N°+Nt,  Oo|  with  relations  A,  B,  and  C  above,  we  apply  simple 
properties  of  covariances  to  conclude 


Cov  (Xs,Xt)  =  2  Cov  (N?,N?)+2  Cov  (Ng,  Nt) 
-Cov(N|+N?,  N?+N?) 

0  _0 

where  Xt=NfNt  as  usual . 


/TlO  ttO> 


Applying  Proposition  4  and  some  algebra, 

Cov(Xs,Xt)  =  F(s)+R(s)-(F(t)-R(t) ) (F(s)-R(s) )+  J (o,s)+J(o,t)-J(s,t)  ,  Os, 

from  which  E  i^P(i ,s)=2J(o,s)+F(s)+R(s) . 
i>l 

Using  the  calculations  above,  we  get 

(33)  Cov  (Xs,Xt)=.EiiFs(i,t)-F(t)-R(t))(F(s)-R(s)). 

Finally,  we  conclude 

Proposition  5.  If  X  is  a  refinable  process  and  .z  (Xi (t)+yi (t) )P(i ,t)  is  a 

i>o 

bounded  function,  then  for  Os,  — 

(34)  Cov  (Xs#Xt)=  I  1Ds(i,t) 

where  Ds(i,t)  satisTies  equations  (12)  for  i>o,  Os,  with  initial  condi¬ 
tions  Ds(i  ,s)  =  (i-F(s)+R(s))P(i  ,s) ,  i>o. 

Proof:  Observe  that  (F(t)-R(t) (F(s)-R(s))  =.E  i (F(s)-R(s) )P(i  ,t)  and 
that  Ds(i,t)  =  Ds(i,t)-(F(s)-R(s))P(i,t)  satiTfy  (12).  Apply  (33).  QED. 

Corollary  5.1.  Assume  (13)  above  and  that  K=»  in  (19).  Then 


36 


,  for  t>s, 


t 

u(u)du 

Cov(Xs,Xt)=(F(s)-R(s))e" 


where  y(t)=R' (t)/(F(t)-R(t)). 


Proof:  Assuming  the  limiting  case  of  (19),  we  have  vi(t)=ip(t)  for  1>o, 

with  v(t)  as  stated.  Let  Z(t )= Ds ( i  ,t)  where  Ds(i,t)  is  as  in  Proposition  5. 

Then  from  (12), 


Z(t)=  C +/  ^Io(Xi(u)-pi(u))Ds(i,u)du 


=  C -j  y(u)Z(u)du, 


Therefore, 


where  C  is  a  constant, 
)du 


Z(t)=(F(s)-R(s))e" 


so  V  (t)=-y(t)Z(t) ,  t>s. 


solving  the  differential  equation,  and  using  Z(s)=F(s)-R(s) .  QED. 


6.  Summary.  By  requiring  the  active  fault  count  process  X  to  satisfy  appro¬ 
priate  conditions,  we  can  estimate  the  mean  functions  of  the  counting 
processes  for  activated  faults  and  for  software  repairs,  as  long  as  the 
estimation  uses  only  a  few  moments  of  the  distributions  of  these  processes. 
The  same  conditions  can  then  be  used  to  restate  the  differential  equations 
for  the  one-dimensional  densities  of  X  in  terms  of  the  above  mean 
functions,  and  secondly  to  state  the  differential  equations  and  initial 
conditions  for  computation  of  distributions  such  as  time  to  next  repair, 
time  to  next  fault,  time  to  next  extinction  of  the  active  fault  popula¬ 
tion,  etc.,  that  are  important  to  estimating  software  reliability. 

If  the  active  fault  count  process  is  assumed  to  be  a  refinable  process, 
then  the  process  can  be  refined  by  counting  selected  transitions  occurring 
after  a  given  time  s. 

The  appropriate  choice  of  refinement  can  be  used  to  develop  higher 
moments  of  the  activated  fault  or  repair  counting  process,  resulting  in 
an  iterative  scheme  for  estimating  the  mean  functions  of  these  counting 
processes,  or  to  develop  distribution  functions  of  random  variables  in¬ 
teresting  from  the  perspective  of  software  reliability.  Since  a  refine¬ 
ment  of  a  refinable  process  is  itself  refinable,  one  can  even  develop 
expressions  for  such  quantities  as  the  distribution  of  time  to  first 
repair  subsequent  to  first  extinction,  etc. 

A  computer  program  in  FORTRAN  has  been  written  by  this  author  that  takes 
as  input  the  times  of  software  faults  and  times  of  software  repairs  and  per¬ 
forms  the  computations  outlined  above.  The  program  assumes  (13)  and  (17)  for 
estimating  the  fault  counting  process  mean  function  and  the  repair 
counting  process  mean  function,  is  capable  of  estimating  F(t)  by  max¬ 
imization  of  a  Poisson  likelihood  function  and  of  estimating  R(t)  for 
a  finite  number  of  debuggers  by  minimization  of  (20)  under  the  constraint 
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that  the  covariance  function  satisfies  Proposition  4. 

The  program  operates  as  follows.  Interval  counts  of  activated  faults 
are  formed  at  selected  times  from  data  and  a  Poisson  mean  function  is 
estimated  via  maximization  of  a  likelihood  function.  Following  this, 
interval  counts  of  repairs  are  made,  forming  the  Ci  of  (20).  Next,  the 
iterative  scheme  for  minimization  of  (20),  applying  Proposition  4,  is 
performed.  As  a  by-product  of  this  minimization  values  of  y(t), 
satisfying  (19),  and  P(i,t),  t>_o,  i>o,  satisfying  (12)  are  computed. 

When  the  iteration  converges,  these  values  are  retained;  the  \i{ t)  are 
subsequently  used  for  the  solution  of  systems  such  as  (22)  and  (32)  above, 
and  the  distributions  jP(i,tj),  i>o|  for  various  tj  are  used  to  establish 
initial  conditions  for  solution  of  these  systems.  Computations  involving 
various  distribution  functions,  applying  Corollary  2.2  for  various  s>o,  are 
then  made. 

As  a  result,  the  program  is  also  capable  of  producing  values  of  the 
following  (as  functions  of  time,  at  various  times):  conditional  mean  and 
standard  deviation  of  time  to  next  fault,  mean  cumulative  activated  faults, 
mean  cumulative  repairs,  the  probability  that  the  number  of  faults  remaining 
to  be  activated  is  at  most  N  (for  various  N),  the  probability  that  no  new 
faults  will  be  activated  for  an  interval  of  length  L  (for  various  L) ,  mean 
and  standard  deviation  of  the  count  of  entries  (via  repair)  of  a  state  of 
N  active  faults  (for  various  N),  mean  and  standard  deviation  of  the  con¬ 
ditional  distribution  of  time  to  next  entry  (via  repair)  of  a  state  of  N 
active  faults  (for  various  N),  and  the  discrete  distribution  of  the  active 
fault  count. 
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ABSTRACT.  Methods  of  software  reliability  estimation  have  been  applied  to 
data  collected  on  two  large  software  development  projects. 

Characteristics  of  software  reliability  have  been  derived  through  ap¬ 
plication  of  population  process  techniques  to  software  failure  and  repair 
data.  They  include  the  following:  mean  time  between  events  (faults/ 
repairs),  mean  time  to  next  fault,  number  of  faults  remaining  (to  be  dis¬ 
covered),  length  of  successful  mission,  and  time  to  last  fault. 

The  results  of  the  application  demonstrate  the  feasibility  of  the  ap¬ 
proach.  The  characteristics  provide  a  basis  for  judgement  of  the  quality  and 
maturity  of  software. 

1.  Introduction.  A  computer  program  has  been  developed  for  estimating 
software  reliability  characteristics  using  the  mean  functions  of  transition 
counting  processes  associated  with  a  birth-death  process.  Data  on  fault  dis¬ 
covery  and  repair  rates  for  two  large  software  development  projects  were  an¬ 
alyzed  with  this  program;  the  results  are  presented  here. 

The  first  data  used  were  published  (1)  for  a  large  software  development 
project,  referred  to  here  as  System  A,  at  the  completion  of  the  test  phase. 
These  data  are  presented  in  histogram  form  for  faults  and  repairs  as  two  sets 
of  event  counts  within  each  of  18  time  units  of  equal  length. 

The  second  data  analyzed,  designated  as  System  B,  were  collected  on  a 
software  development  project  during  the  middle  testing  phase.  These  data  con¬ 
sist  of  fault  discovery  times  and  repair  installation  times,  where  the  time 
unit  used  in  the  analysis  is  months  of  operation  time  at  event  occur¬ 
rence. 

For  each  set  of  data,  mean  functions  of  fault  and  repair  counting  proces¬ 
ses  were  developed  by  assuming  a  parametric  form  and  fitting  the  data  for 
parameter  estimation.  The  counting  process  mean  functions  were  then  used  to 
develop  estimates  of  six  software  reliability  characteristics: 

(1)  D.  K.  Lloyd  and  M.  Lipow,  Reliability:  Management,  Methods,  and  Mathemat¬ 
ics  (Second  Edition),  p.  519;  Redondo  Beach,  CA,  1977. 


*  Sponsored  by  the  United  States  Amy  under  Contract  No.  DAEA18-77-C-0134. 
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Mean  Time  Between  F vents  (MTBF,  MTBR) 


The  instantaneous  mean  time  between  faults  at  time  t  Is  the  reciprocal  of 
the  derivative  of  the  mean  number  of  faults  activated  per  unit  time  at  time  t. 
Similarly,  the  instantaneous  mean  time  between  repairs  at  time  t  is  the  re¬ 
ciprocal  of  the  derivative  of  the  mean  number  of  repairs  installed  per  unit 
time  at  time  t.  This  basic  estimate  leaves  the  variation  of  the  estimate  un¬ 
known  and,  taken  alone,  is  easily  misinterpreted. 

Mean  Time  to  Next  Fault 


If  a  growth  in  reliability  is  to  occur,  it  is  reasonable  to  hope  that  the 
appearance  of  new  faults  is  less  than  certain  and  that  the  probability  of  new 
faults  will  decrease.  Therefore,  we  consider  the  distribution  of  time  to  next 
fault  conditioned  on  the  event  that  a  next  fault  occurs. 

Number  of  Faults  Remaining 

After  the  fault  mean  function  is  estimated,  the  expectation  of  number  of 
faults  not  yet  activated  after  a  time  tj  is  F(-)-F(ti).  This  infor¬ 
mation  is  presented  as  the  probability  that  the  number  of  faults  to  be 
activated  after  time  t  is  at  most  N,  for  selected  values  of  N. 

Length  of  Successful  Mission 

The  reliability  of  software  can  be  expressed  in  a  useful  way  as  the 
probability  that  the  software  will  function  under  typical  input  conditions  for 
a  specific  length  of  time  without  the  appearance  of  new  faults.  The  input 
conditions  must  be  assumed  to  be  equivalent  to  the  test  conditions  with  re¬ 
spect  only  to  data  input. 

Time  to  last  Fault 


An  estimate  useful  for  management  of  the  test  cycle  is  an  estimate  of  the 
time  to  bring  software  to  a  given  level  of  competence.  The  distribution  of 
time  until  last  fault  activation  also  serves  as  a  lower  bound  on  the  time  to 
last  repair  estimate.  This  estimate  is  presented  as  the  mean  time  to  last 
fault  and  a  table  of  probabilities  that  the  last  fault  has  been  activated 
prior  to  given  times. 


2.  Appl i cati on .  The  Fault  Mean  Function  was. assumed  to  be  that  of  a  Poisson 
process  with  the  parametric  form  F(t)=:a(l-e  ’  )*.  With  this  form,  the 

expectation  of  total  fault  activations  is  F(~)=a.  The  parameters  B  and  p  de¬ 
termine  F(t)  as  the  product  of  a  Weibull  distribution  and  a  constant. 


For  System  A,  the  fault  mean  function  was  estimated  by  two  methods;  max¬ 
imization  of  a  Poisson  likelihood  function  and  by  generalized  least  squares. 
Each  method  yielded  the  same  parametric  estimates,  below  to  six  place  ac¬ 
curacy,  for  both  systems. 


*For  clarification:  F(t)  =  cx(l  -  exp[~8tP]). 
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System 

A 

B 

a  * 

764.704 

898.903 

8  * 

.094 

9.259 

P  * 

1.183 

3.273 

For  System  A,  Figure  1  presents  the  Fault  Mean  Function  with  one  standard  de¬ 
viation  bounds  plotted  versus  observations.  Examination  shows  that  each  ob¬ 
servation  unit  fell  at  or  within  one  standard  deviation. 

Figure  2  presents  the  same  plot  for  System  B.  Examination  shows  that  the 
fit  was  not  as  good  for  System  B,  with  a  third  of  the  observations  at  or 
slightly  exceeding  one  standard  deviation  from  the  mean. 

The  Repair  Mean  Function  form  was  selected  to  constrain  the  eventual  mean 
total  count  of  repairs  to  equal  the  eventual  mean  total  count  of  faults.  The 
form  also  satisfies  the  constraint  that  the  fault  mean  be  an  upper  bound  for 
the  repair  mean  at  all  times.  The  forms  of  R(t)  =  F(t)x 
(l-e’6tp  )*for  System  A, and  R(t)  defined  via  R' (t)-p' e_P't(F(t)-R(t)) 
for  System  B  meet  the  desired  constraints  and  are  convenient,  as  both  have 
easily  computed  values  and  depend  on  only  two  parameters.  For  both  System  A 
and  B,  the  repair  counting  process  mean  function  was  estimated  by  the 
generalized  least  squares  method  resulting  in  the  following  parameter 
estimates. 


System 

A 

B 

6' 

.151 

.169 

p‘ 

1.029587 

15.331 

For  System  A,  Figure  3  presents  the  Repair  Mean  Function  with  one  standard 
deviation  bounds  plotted  versus  observations.  The  mean  function  deviates 
greatly  from  the  data  in  interval  12  where  nearly  100  repairs  were  installed 
within  one  interval.  At  interval  15  the  data  and  mean  function  again  coin¬ 
cide.  This  type  of  data  discontinuity  is  not  unexpected  in  repair  rates  as  It 
is  common  practice  to  accumulate  corrections  and  install  them  at  a  convenient 
time. 


For  System  B,  Figure  4  shows  a  more  consistent  repair  installation  rate 
during  the  represented  phase  of  testing. 

The  instantaneous  MTBF  of  System  A,  listed  as  a  function  of  40  time  units 
In  Figure  5,  shows  a  steady  increase  throughout  the  18  time  units  of  test, 
with  a  projected  dramatic  increase  if  the  test  and  repair  phase  had  continued 
for  40  units.  The  MTTR  remained  relatively  stable  through  the  testing  inter- 


pteviouffy ^de te rained f ^ c ^  '  F(c)«  *  expt-SV'l).  assume  that  F(t)  has,  been 
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FAULT  MEAN  FUNCTION  ANO  BOUNDS  OK  ONI  STANDARD  DEVIATION  VERSUS  OBSERVATIONS 
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TIME - -  MEAN*  >  t  OBSERVATION*  0  *  ONE  STANOARC  DEVIATION  ROUNDS*  ♦/-  »  HULTIPLt  POINT*  *  Figure 


MEAN  TIME  BETWEEN  FAULTS  AND  MEAN  TIME  BETWEEN  REPAIRS  AT  SELECTED  TIME  POINTS 


TIME  ABSCISSA  MTBF  MTBR 


1.00 

.013 

.050 

2.00 

.013 

.026 

3.00 

.014 

.020 

4.00 

.015 

.018 

5.00 

.016 

.017 

6.00 

.019 

.017 

7.00 

.021 

.018 

8.00 

.024 

.019 

9.00 

.028 

.020 

10.00 

.032 

.023 

11.00 

.038 

.026 

12.00 

.044 

.029 

13.00 

.052 

.033 

14.00 

.062 

.038 

15.00 

.073 

.044 

16.00 

.087 

.052 

17.00 

.103 

.061 

18.00 

.123 

.071 

19.00 

.148 

.084 

20.00 

.178 

.100 

21.00 

.214 

.118 

22.00 

.258 

.141 

23.00 

.312 

.168 

24.00 

.377 

.200 

25.00 

.458 

.239 

26.00 

.556 

.287 

27.00 

.677 

.344 

28.00 

.826 

.413 

29.00 

1.009 

.496 

30.00 

1.234 

.596 

31.00 

1.512 

.717 

32.00 

1.855 

.864 

33.00 

2.278 

1.041 

34.00 

2.803 

1.255 

35.00 

3.452 

1.514 

36.00 

4,258 

1.827 

37.00 

5.258 

2.206 

38.00 

6.500 

2.663 

39.00 

8.046 

3.216 

40.00 

9.971 

3.885 

Figure  5 
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val  but  would  have  been  expected  to. increase  if  testing  had  continued  for  40 
units. 

The  instantaneous  MTBF  of  System  B  over  32  months  is  shown  in  Figure  6. 

The  test  period  corresponds  to  the  first  30  units  (29.75  time  abscissa).  The 
MTBF  shows  a  decline  as  testing  began  and  faults  were  activated  with  a  stable 
low  point  over  eight  months  before  growth  began.  This  testing  covered  the 
initial  period  of  system  integration  and  is,  therefore,  consistent  with 
intuitively  expected  results  as  major  modules  initially  interact  and  interface 
problems  are  discovered.  The  MTTR  decreased  through  the  stable  fault 
activities  interval  and  increased  as  the  fault  activation  rate  decreased. 

This  may  be  interpreted  as  representing  intense  debugging  effort  (long  hours, 
more  debuggers,  high  priority  when  faults  were  at  a  high  rate)  with  a  re¬ 
laxation  of  effort  as  the  situation  was  placed  under  control. 

The  Mean  Time  to  Next  Fault  for  System  A  is  shown  in  Figure  7.  The  is¬ 
sue  of  whether  or  not  a  next  fault  will  occur  is  not  in  question  until  time 
unit  27  and  the  mean  time  to  next  fault  shows  a  steady  growth.  The  same  es¬ 
timation  for  System  B  is  shown  in  Figure  8.  The  same  pattern  is  evident. 

The  table  of  values  for  estimating  the  number  of  faults  remaining  to  be 
discovered  is  given  at  Figure  9  for  System  A  and  Figure  10  for  System  B.  The 
selected  values  of  N  are  incremented  by  5  through  the  0-25  range.  The 
probability  that  there  were  no  more  than  25  faults  remaining  in  the  software 
of  System  A  at  the  end  of  testing  was  only  .104;  for  System  B  it  is  1.00. 

These  results  appear  plausible  for  System  A,  but  highly  implausible  for  System 
B  unless  the  System  B  results  are  an  indication  that  the  testing  conditions  of 
integration  testing  were  nearing  the  end  of  their  effectiveness  and  would 
invoke  few  remaining  faults.  As  System  B  testing  continues  under  more  strenu¬ 
ous  conditions,  the  author  expects  new  fault  types  to  be  displayed  with  a 
probable  new  fault  mean  function  for  the  next  test  phase. 

The  expected  length  of  a  successful  mission  was  analyzed  for  both  Sys¬ 
tem  A  and  B  where  the  mission  is  assumed  to  be  using  the  system  under  the 
input  conditions  of  the  test  period.  Figure  11  presents  the  results  for  Sys¬ 
tem  A  for  missions  with  respect  to  length  L,  where  L=.143  of  the  time  unit. 

At  the  end  of  40  time  units  of  testing,  the  probability  of  executing  (or  tes¬ 
ting)  for  L  time  with  no  new  faults  is  .986.  Figure  12  presents  the  results 
for  Systen  B  in  terms  of  L,  where  L  is  approximately  one  day.  At  the  end  of 
testing  (29.75  time  abscissa)  the  probability  of  executing  for  one  day  with  no 
new  faults  is  ,955. 

The  Time  to  Last  Fault  Distribution  for  System  A,  presented  at  Figure  13 
for  times  from  40  to  60,  has  a  mean  of  38.75  time  units  and  a  standard  de¬ 
viation  of  5.082.  Although  the  testing  was  stopped  at  time  abscissa  18,  the 
distribution  suggests  that  a  test  period  to  44  time  units  had  an  82.8  percent 
probability  of  disclosing  all  faults  and  testing  to  46.8  would  have  had  90.5 
percent  probability  of  disclosing  all  of  the  faults.  This  information  would 
have  been  useful  to  take  into  consideration  with  the  cost  of  testing  and 
intended  use  of  the  system.  For  System  B,  Figure  14,  the  mean  was  31,284 
months  and  the  standard  deviation  was  1.425.  The  test  period  (29.75)  had  an 
estimated  18  percent  probability  of  disclosing  all  faults;  a  test  period  of  34 
months  would  have  had  a  94  percent  probability  of  disclosing  all  faults. 
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MEAN  TIME  BETWEEN  FAULTS  AND  MEAN  TJME 


TIME  ABSCISSA 

MTBF 

7.25 

.043 

8.25 

.033 

9.75 

.024 

10.75 

.021 

11.25 

.019 

12.00 

.018 

13.00 

.016 

13.50 

.016 

14.25 

.015 

15.00 

.015 

15.25 

.015 

16.00 

.015 

17.25 

.016 

18.50 

.018 

19.25 

.019 

19.50 

.020 

20.25 

.023 

21.25 

.027 

24.50 

.067 

29.75 

.775 

29.95 

.875 

30.15 

.990 

30.35 

1.123 

30.55 

1.276 

30.75 

1.454 

30.95 

1.660 

31.15 

1 .899 

31.35 

2.178 

31.55 

2.503 

31.75 

2.884 

31.95 

3.330 

32.15 

3.855 

Figure 


BETWEEN  REPAIRS  AT  SELECTED  TIME  POINTS 


MTBR 

.049 

.038 

.028 

.024 

.022 

.020 

.019 

.018 

.018 

.017 

.017 

.017 

.018 

.019 

.021 

.021 

.023 

.027 

.049 

.169 

.177 

.186 

.195 

.205 

.215 

.225 

.236 

.247 

.259 

.271 

.284 

.297 
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POINT 


PROBABILITY  OF 

A  NEXT  FAULT  AND 

CONDITIONAL  MEAN 

AND  STANDARD 

DEVIATION  OF  TIME 

TO  NEXT  FAULT 

PROBABILITY 

CONDITIONAL 

OF  A 

MEAN  TIME  TO 

STANDARD 

TIME  ABSCISSA 

NEXT  FAULT 

NEXT  FAULT 

DEVIATION 

1.00 

1.000 

.013 

.013 

2.00 

1.000 

.013 

.013 

3.00 

1.000 

.014 

.014 

4.00 

1.000 

.015 

.015 

5.00 

1.000 

.016 

.016 

6.00 

1.000 

.019 

.019 

7.00 

1.000 

.021 

.021 

8.00 

1.000 

.024 

.024 

9.00 

1.000 

.028 

.028 

10.00 

1.000 

.033 

.033 

11.00 

1.000 

.038 

.038 

12.00 

1.000 

.045 

.045 

13.00 

1.000 

.053 

.053 

14.00 

1.000 

.062 

.063 

15.00 

1.000 

.074 

.075 

16.00 

1.000 

.088 

.089 

17.00 

1.000 

.105 

.107 

18.00 

1.000 

.126 

.129 

19.00 

1.000 

.152 

.157 

20.00 

1.000 

.184 

.191 

21.00 

1.000 

.223 

.234 

22.00 

1.000 

.272 

.288 

23.00 

1.000 

.333 

.359 

24.00 

1.000 

.410 

.454 

25.00 

1.000 

.509 

.586 

26.00 

1.000 

.638 

.774 

27.00 

.999 

.807 

1.036 

28.00 

.997 

1.025 

1.372 

29.00 

.992 

1.294 

1.763 

30.00 

.980 

1.608 

2.172 

31.00 

.957 

1.947 

2.565 

32.00 

.923 

2.292 

2.917 

33.00 

.874 

2.621 

3.216 

34.00 

.812 

2.921 

3.461 

35.00 

.741 

3.185 

3.657 

36.00 

.664 

3.410 

3.810 

37.00 

.585 

3.598 

3.928 

38.00 

.507 

3.753 

4.018 

39.00 

.434 

3.878 

4.086 

40.00 

.367 

3.977 

4.137 

Figure  7 
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PROBABILITY  OF  A  NEXT  FAULT  AND  CONDITIONAL  MEAN 
AND  STANDARD  DEVIATION  OF  TIME  TO  NEXT  FAULT 


PROBABILITY 

CONDITIONAL 

OF  A 

MEAN  TIME  TO 

STANDARD 

POINT 

TIME  ABSCISSA 

NEXT  FAULT 

NEXT  FAULT 

DEVIATION 

1 

7.25 

1.000 

.043 

.042 

2 

8.25 

1.000 

.033 

.033 

3 

9.75 

1.000 

.024 

.024 

4 

10.75 

1.000 

.021 

.021 

5 

11.25 

1.000 

.019 

.019 

6 

12.00 

1.000 

.018 

.018 

7 

13.00 

1.000 

.016 

.016 

8 

13.50 

1.000 

.016 

.016 

9 

14.25 

1.000 

.015 

.015 

10 

15.00 

1.000 

.015 

.015 

11 

15.25 

1.000 

.015 

.015 

12 

16.00 

1.000 

.015 

.015 

13 

17.25 

1.000 

.016 

.016 

14 

18.50 

1.000 

.018 

.018 

15 

19.25 

1.000 

.019 

.020 

16 

19.50 

1.000 

.020 

.020 

17 

20.25 

1.000 

.023 

.023 

18 

21.25 

1.000 

.027 

.028 

19 

24.50 

1.000 

.068 

.070 

20 

29.75 

.851 

.814 

.930 

21 

29.95 

.810 

.859 

.960 

22 

30.15 

.764 

.899 

.984 

23 

30.35 

.715 

.934 

1.003 

24 

30.55 

.663 

.964 

1.018 

25 

30.75 

.610 

.989 

1.029 

26 

30.95 

.556 

1.010 

1.036 

27 

31.15 

.504 

1.026 

1.040 

28 

31.35 

.452 

1.039 

1.042 

29 

31.55 

.403 

1.048 

1.042 

30 

31.75 

.357 

1.055 

1.040 

31 

31.95 

.314 

1.058 

1.036 

32 

32.15 

.275 

1.060 

1.031 

Figure  8 
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PROBABILITY  THAT  AT  MOST  N  FAULTS  ARE  ACTIVATED  AFTER  SELECTED  TIME  POINTS 


POINT 


TIME  VALUE  OF  N 


ABSCISSA 

0 

5 

10 

15 

20 

25 

1.00 

0.000 

.000 

.000 

.000 

.000 

.000 

2.00 

0.000 

.000 

.000 

.000 

.000 

,  .000 

3.00 

0.000 

.000 

.000 

.000 

.000 

.000 

4.00 

0.000 

.000 

.000 

.000 

.000 

.000 

5.00 

0.000 

.000 

.000 

.000 

.000 

.000 

6.00 

0.000 

.000 

.000 

.000 

.000 

.000 

7.00 

0.000 

.000 

.000 

.000 

.000 

.000 

8.00 

0.000 

.000 

.000 

.000 

.000 

.000 

9.00 

0.000 

.000 

.000 

.000 

.000 

.000 

10.00 

0.000 

.000 

.000 

.000 

.000 

.000 

11.00 

0.000 

.000 

.000 

.000 

.000 

.000 

12.00 

0.000 

.000 

.000 

.000 

.000 

.000 

13.00 

0.000 

.000 

.000 

.000 

.000 

.000 

14.00 

0.000 

.000 

.000 

.000 

.000 

.000 

15.00 

0.000 

.000 

.000 

.000 

.000 

.000 

16.00 

.000 

.000 

.000 

.000 

.013 

.104 

17.00 

.000 

.000 

.000 

.000 

.013 

.104 

18.00 

.000 

.000 

.000 

.000 

.013 

.104 

19.00 

.000 

.000 

.000 

.000 

.013 

.104 

20.00 

.000 

.000 

.000 

.003 

.050 

.259 

21.00 

.000 

.000 

.001 

.035 

.244 

.634 

22.00 

.000 

.000 

.013 

.172 

.585 

.900 

23.00 

.000 

.001 

.072 

.450 

.858 

.985 

24.00 

.000 

.009 

.231 

.741 

.970 

.999 

25.00 

.000 

.041 

.479 

.916 

.996 

1.000 

26.00 

.000 

.125 

.723 

.981 

1.000 

1.000 

27.00 

.001 

.273 

.885 

.997 

1.000 

1.000 

28.00 

.003 

.464 

.962 

1.000 

1.000 

1.000 

29.00 

.008 

.653 

.990 

1.000 

1.000 

1.000 

30.00 

.020 

.802 

.998 

1.000 

1.000 

1.000 

31.00 

.043 

.900 

1.000 

1.000 

1.000 

1.000 

32.00 

.077 

.954 

1.000 

1.000 

1.000 

1.000 

33.00 

.126 

.981 

1 .000 

1.000 

1.000 

1.000 

34.00 

.188 

.993 

1.000 

1.000 

1 .000 

1.000 

35.00 

.259 

.997 

1.000 

1.000 

1.000 

1.000 

36.00 

.336 

.999 

1.000 

1.000 

1.000 

1.000 

37.00 

.415 

1.000 

1.000 

1.000 

1.000 

1.000 

38.00 

.493 

1.000 

1.000 

1.000 

1.000 

1.000 

39.00 

.566 

1.000 

1.000 

1.000 

1.000 

1.000 

40.00 

.633 

1.000 

1.000 

1.000 

1.000 

1.000 

Figure  9 
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PROBABILITY  THAT  AT  MOST  N  FAULTS. ARE  ACTIVATED  AFTER  SELECTED  TIME  POINTS 


TIME  VALUE  OF  N 


POINT 

ABSCISSA 

0 

5 

10 

15 

20 

25 

1 

7.25 

0.000 

.000 

.000 

.000 

.000 

.000 

2 

8.25 

0.000 

.000 

.000 

.000 

.000 

.000 

3 

9.75 

0.000 

.000 

.000 

.000 

.000 

.000 

4 

10.75 

0.000 

.000 

.000 

.000 

.000 

.000 

5 

11.25 

0.000 

.000 

.000 

.000 

.000 

.000 

6 

12.00 

0.000 

.000 

.000 

.000 

.000 

.000 

7 

13.00 

0.000 

.000 

.000 

.000 

.000 

.000 

B 

13.50 

0.000 

.000 

.000 

.000 

.000 

.000 

9 

14.25 

0.000 

.000 

.000 

.000 

.000 

.000 

10 

15.00 

0.000 

.000 

.000 

.000 

.000 

.000 

11 

15.25 

0.000 

.000 

.000 

.000 

.000 

.000 

12 

16.00 

0.000 

.000 

.000 

.000 

.000 

.000 

13 

17.25 

0.000 

.000 

.000 

.000 

.000 

.000 

14 

18.50 

0.000 

.000 

.000 

.000 

.000 

.000 

15 

19.25 

0.000 

.000 

.000 

.000 

.000 

.000 

16 

19.50 

0.000 

.000 

.000 

.000 

.000 

.000 

17 

20.25 

0.000 

.000 

.000 

.000 

.000 

.000 

18 

21.25 

0.000 

.000 

.000 

.000 

.000 

.000 

19 

24.50 

.000 

.000 

.000 

.000 

.013 

.104 

20 

29.75 

.149 

.987 

1.000 

1.000 

1.000 

1.000 

21 

29.95 

.190 

.993 

1.000 

1.000 

1.000 

1.000 

22 

30.15 

.236 

.996 

1.000 

1.000 

1.000 

1.000 

23 

30.35 

.285 

.998 

1.000 

1.000 

1.000 

1.000 

24 

30.55 

.337 

.999 

1.000 

1.000 

1.000 

1.000 

25 

30.75 

.390 

1.000 

1.000 

1.000 

1.000 

1.000 

26 

30.95 

.444 

1.000 

1.000 

1.000 

1.000 

1.000 

27 

31.15 

.496 

1.000 

1.000 

1.000 

1.000 

1.000 

28 

31.35 

.548 

1.000 

1.000 

1.000 

1.000 

1.000 

29 

31.55 

.597 

1.000 

1.000 

1.000 

1.000 

1.000 

30 

31.75 

.643 

1.000 

1.000 

1.000 

1.000 

1.000 

31 

31.95 

.686 

1.000 

1.000 

1.000 

1.000 

1.000 

32 

32.15 

.725 

1.000 

1.000 

1.000 

1.000 

1.000 

Figure  10 
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PROBABILITIES  OF  SUCCESSFUL  MISSIONS  OF  LENGTHS  WHICH  ARE 
QUARTER  MULTIPLES  OF  L=  .143  AT  SELECTED  TIME  POINTS 


O 

— J  o 

C\J  o 


HfSiLHr-iCsJON^CsJC^NOOa^lDN^^OlCTiCMHin^rQNCTiCOyDrQ 

•  *OOOHC\J^^DOlfiON'<THN^OU)rHLO^nU3COO(\JCQ^miON 

oooooooHHNNro^^inu?^NNNcccococ^o^a^ffKjiaioi 


r^ 


o 

G 

O 


CM  OO  OO  UO 
O  O  O  ■  CO 
O  O  C3  O  O 


(J»  OO  Cvj  CO 
LO  oo  co  oo  LO 

OOHHCVJ 


C^cor^<rr- 

rH  co  ^  cnj  co 

ro  ro  un  t-O 


IP  CO  ID  1 — I 

cn  CO  (M 

wd  wd  r--  co 


i — <  cto  c\j 
lo  r  ■  oo  • — •  ro 
co  oo  co  oo  Go 


loooud 
LO  U)  n 
oo  oo  oo  ao  oo 


CM 

CO 


o 

o 

o 


^  cn  O0  CO  ID  OO 
oOHroioco 
O  O  o  O  O  CD 


r-l  (\J  rH  lo  CO 

ro  oD  ^  O  N 

hh  tNjr)n<3* 


o  ro  cm  wo  wo 
hncqoopqn 
ID  LO  ID  ^ 


cm  **  o  ro  cm  oo 

rH  vl  N  Ch  CM 
CO  00  CO  CO  oo  OO 


H  CM  1-H  00  «5f*  OO 
ff  ID  ID  ID  N  N 
OO  OO  OO  00  OO  CTi 


ao  oo  <-  o 

*  O  h  ro  id  ai 

o  o  o  o  o 


f — I  CM  i — <  LO  CM 

nco^oi^ 

rH  rH  (M  ro  CO 


O  LO<D  CO  OJ 
<r  o  wo  cm  oo 

ID  ID  ID  ID 


O  C\J  CTt  O  CO 
co  o  wo 
r-^  oo  co  oo 


O  O  IDOI  O 
<D  H  (\J  CO  LD 
CO  CT>  CT>  OO  0> 


O  N  CO  CO  CO 

wo  wo  r-  c-  CD 
oo  o>  oo  oo  oo 


o 

o 

o 

* 


i — <  r-  t — t  wo 

csj  wo  o  Kt  oo 

O  O  o  « — I  » — H  » — H 


LD  oo  uo  OJ  N  Ol 
ID  ri  CO  U0  *-h  1^ 

cm  co  co  *=r  lo  wo 


WD  co  wo  wo  cm  ro 
coooroN^^ 
wd  wo  t"-  co  co 


o  cm  * — 1  o  1 * 

r-—  Oo  r — t  CM  “M*  tD 
00  CO  OO  oo  GO  CTi 


O  Oo  CO  WO 

WO  WD  hs  CO  CO 
oo  oo  OO  OO  OO  OO 


*sf 

CO 


o 

o 

o 


OrHOODO^t 

id  Oo  ro  in  co  o 

OOHHMCM 


O0  co  CD  o  O0  CO 
LD  CM  CO  WD  O  WO 
CO  *&  Kj-  ID  WD  WD 


W  ID  CO  S  D  O 
r-t  uo  Do  CM  LO  oo 
k  n  r-  co  co  co 


CD  CO  CM  ^  LD  CO 
O  r — 1  CO  LO  WQ 
00  OO  OO  OO  OO  O0 


OlD  O^r  InOi 
|n  In  CO  CO  OO  00 
Oo  CTO  OO  Oo  Oo  OO 


CQrHlDLDcOHCQCOCnHCOCIMDCDWOOHCOCNJ^^'CMCMDOCQNaiHCO 
IDODrHN^OUDHlNHlDCOMLDOOOHCO^lD^lDNQOOQCOCOCnai 
r- iCMCMCOrO^J"WOLDWDWOI>^f^'r^COCOCOOoOoOOOOOoOOOOOOOoOOOOOOOOOo 


CM 

WD 

o 


ONlDri^CQCODlDCONHNOLDCODCOlDCMf^HDINOCMCQlDlO^D 
O^O^H^O^COri^r^OHCWJCO^tlDlDlNN  COCOOOOOOOOOOOOOCO 
ro^d-LOLDwDWDr^r^r^coooooeXDOoooooooOoooOoooa^ooChOooooooooooo 


< 

wo 

CO 


1  I  I 

>-  o 


•— *  co 
h-  CO 

c 


o 

o 


OOOOOOOOOOOOOOOOOOOGOCDOOOOOOOO 

OOOCDOOOOOOOOOGOOOOOOOOGGOOOOOO 


HOJ  ro 


IDIDNCOOIOHCMCO^IDID 

HHHHriOJCNJMCMCVICvJW 


N  CO  c^o 
CM  CM  CM  CO 


, — i(\JrO^-LfUDiNcOCriO 

rocorococococococO1^ 


K- 
2: 
► — * 
o 


a_ 


Cm  ro  *3- 


lDtD|NC0CDOr-(CSJr)^LD 

hhhhhcwicmmmojM 


wo 

CM 


IN  CO  D  o 
CM  CM  CM  CO 


r— ♦  CM  ro 
co  ro  ro  co 


ID  wo 
ro  co 


r-.  co  oo  o 

CO  CO  CO  Kf 


Figure  11 
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PROBABILITIES  OF  SUCCESSFUL  MISSIONS  OF  LENGTHS  WHICH  ARE 
QUARTER  MULTIPLES  OF  L=  .036  AT  SELECTED  TIME  POINTS 
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Figure  12  56 


VALUES  OF  THE  TIME-TO-LAST-FAULT  DISTRIBUTION 
WITH  MEAN  38.750  AND  STANDARD  DEVIATION  5.082 


POINT 

TIME  ABSCISSA 

DISTRIBUTION  VALUE 

102 

40.80 

.682 

103 

41.20 

.704 

104 

41.60 

.726 

105 

42.00 

.746 

106 

42.40 

.764 

107 

42.80 

.782 

108 

43.20 

.799 

109 

43.60 

.814 

110 

44.00 

.828 

111 

44.40 

.842 

112 

44.80 

.854 

113 

45.20 

.866 

114 

45.60 

.877 

115 

46.00 

.887 

116 

46.40 

.896 

117 

46.80 

.905 

118 

47.20 

.912 

119 

47.60 

.920 

120 

48.00 

.926 

121 

48.40 

.933 

122 

48.80 

.938 

123 

49.20 

.944 

124 

49.60 

.948 

125 

50.00 

.953 

126 

50.40 

.957 

127 

50.80 

.960 

128 

51.20 

.964 

129 

51.60 

.967 

130 

52.00 

.970 

131 

52.40 

.972 

132 

52.80 

.975 

133 

53.20 

.977 

134 

53.60 

.979 

135 

54.00 

.981 

136 

54.40 

.983 

137 

54.80 

.984 

138 

55.20 

.986 

139 

55.60 

.987 

140 

56.00 

.988 

141 

56.40 

.989 

142 

56.80 

.990 

143 

57.20 

.991 

144 

57.60 

.992 

145 

58.00 

.992 

146 

58.40 

.993 

147 

58.80 

.994 

148 

59.20 

.994 

149 

59.60 

.994 

150 

60.00 

.995 

151 

60.40 

.996 

Figure  13 


VALUES  or  1  HI  11ML-10-IAS1-FAUL1  D1  SIR! iUH  ]0N 
WITH  MEAN  31.284  AND  STANDARD  DEVIATION  1.425 


POINT 

TIME  ABSCISSA 

D1STR1BUT1 

102 

32.79 

.827 

103 

33.11 

.866 

104 

33.43 

.897 

105 

33.75 

.921 

106 

34.07 

.941 

107 

34.39 

.956 

108 

34.71 

.967 

109 

35.03 

.976 

110 

35.36 

.982 

111 

35.68 

.987 

112 

36.00 

.991 

113 

36.32 

.993 

114 

36.64 

.995 

115 

36.96 

.997 

116 

37.28 

.998 

117 

37.61 

.998 

118 

37.93 

.999 

119 

38.25 

.999 

120 

38.57 

1.000 

121 

38.89 

1.000 

122 

39.21 

1.000 

123 

39.53 

1.000 

124 

39.96 

1.000 

125 

40.18 

1.000 

126 

40.50 

1.000 

127 

40.82 

1.000 

128 

41.14 

1.000 

129 

41.46 

1.000 

130 

41.78 

1.000 

131 

42.11 

1.000 

132 

42.43 

1.000 

133 

42.75 

1.000 

134 

43.07 

1.000 

135 

43.39 

1.000 

136 

43.71 

1.000 

137 

44.03 

1.000 

138 

44.36 

1.000 

139 

44.68 

1.000 

140 

45.00 

1.000 

141 

45.32 

1.000 

142 

45.64 

1.000 

143 

45.96 

1.000 

144 

46.28 

1.000 

145 

46.61 

1.000 

146 

46.93 

1.000 

147 

47.25 

1.000 

148 

47.57 

1.000 

149 

47.89 

1.000 

150 

48.21 

1.000 

151 

48.53 

1.000 

Figure  14 

3.  Summary.  These  Initial  applications  of  the  model  demonstrate  the  useful 
ness  of  the  estimation  method  and  the  consistency  and  plausibility  of  the  es 
tir.2ti on.  The  accuracy  of  the  estimation  will  be  further  verified  by  con¬ 
tinued  analysis  through  the  additional  testing  of  system  B  and  by  ap¬ 
plication  of  the  model  to  additional  systems  under  test.  sft  " 


PROBABILISTIC  PROGRAM  ESTIMATES  -  COMPARISON  OF  SIMULATED  RESULTS 
USING  BETA  VIS-A-VIS  TRIANGULAR  ACTIVITY  DISTRIBUTIONS 


Conrad  W.  Faber 

U.S*  Army  Aviation  Research  and  Development  Command 
St*  Louis,  Missouri  63120 


ABSTRACT.  When  developing  probabilistic  program  estimates  for  systems  in 
the  R&D  stage  from  three  point  estimates  of  component  parts,  the  triangular 
distribution  is  often  assumed  for  the  parts. 

The  model/method  proposed  assumes  the  component  distributions  are  described 
by  beta  probability  densities  and  compares  the  results  vis-a-vis  assuming 
triangular  distributions. 

The  model  assumes: 

a.  Germane  historical  statistical  data  is  not  available  nor  is  a  bottoms  up 
engineering  estimate  appropriate. 

b.  Enough  knowledge  is  available  to  estimate  the  general  shape  of  the  beta 
distributions . 

c.  The  user  has  access  to  computer  facilities. 

1.  INTRODUCTION.  To  adequately  evaluate  the  risk  (time  and/or  cost)  of  a 
new  Army  program,  the  decision-maker  needs  more  than  a  point  estimate,  i.e*, 
some  measure  of  how  much  the  estimate  could  be  in  error.  Although  methodology 
and  models  exist  for  developing  risk  profiles  based  upon  historical  data, 
frequently  a  data  base  is  not  available  or  is  not  sufficiently  analogous  or  the 
physical/perf ormance  characteristics  of  the  new  system  are  beyond  the  reliable 
range  of  the  data  base.  Consequently,  estimates  for  the  new  system  frequently 
depend  upon  expert  opinion  of  a  very  few  knowledgeable  persons. 

Often  the  "expert"  is  asked  to  provide  the  range  and  most  likely  (modal) 
estimates  of  several  activities,  e.g.,  costs  to  fabricate  subsystems,  integration 
and  test  costs,  etc.  A  common  method  of  combining  these  activity  three  point 
estimates  is  to  assume  a  triangular  distribution  for  each  activity  and  determine 
a  total  system  cost  by  Monte  Carlo  simulation. 

The  triangular  distribution  is  completely  described  by  the  mode  and  range. 

The  beta  distribution  includes  one  additional  shape  parameter  which  offers  much 
greater  flexibility.  Because  of  Its  rigidity,  the  triangular  distribution  can 
easily  misrepresent  the  probabilities  within  the  range  as  shown  by  an  example  in 
Appendix  A. 

The  proposed  method  described  in  this  paper  suggests  the  expert  select  one 
of  nine  beta  distributions  which  best  describes  the  general  shape  of  the  probability 
distribution  for  each  activity.  Appendix  B  compares  these  nine  distributions 
vis-a-vis  triangular  distributions  with  the  same  range  and  modes.  The  proposed 
method  also  includes  a  simplistic  system  estimate  and  compares  the  results  obtained 
by  using  both  the  beta  and  triangular  distributions. 
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PROPOSED  METHOD 


GENERAL: 

a.  Uncertainty  (risk)  of  an  activity*  can  be  described  by  a 
probability  distribution.  This  uncertainty  relates  to  potential 
technical  risks  and  economic  factors.  Another  major  source  of 
uncertainty,  not  covered  in  this  paper,  is  requirements  uncertainty, 
e.g.,  changes  in  performance  requirements  and  quantity  procured* 

For  a  given  risk  assessment,  requirements  are  assumed  to  be  fixed. 
Program  uncertainty  is  a  convolution  of  the  activities  risks. 

b*  To  determine  program  uncertainty,  a  PERT  type  network  must 
be  developed  displaying  the  activities  and  events  and  their  major 
interdependencies.  This  network  should  be  correlated  to  the  elements 
of  the  program  work  breakdown  structure*  For  each  activity,  a  distri¬ 
bution  describes  the  uncertainty  involved  in  that  activity.  When 
applicable  historical  data  is  available  or  factors  assumed,  appropriate 
distributions  should  be  used.  This  paper  describes  a  method  of 
estimating  activity  distributions  when  the  above  is  not  available. 

c*  Once  the  activity  distributions  and  parameters  are  specified, 
a  total  program  (or  intermediate  milestone)  probability  distribution 
can  be  derived  by  Monte  Carlo  simulation.  Three  models  (RISCA, 

SOLVNET  and  VERT)  are  described  in  DARCOM  Handbook  11-1.1-79,  Army 
Programs:  Decision  Risk  Analysis  Handbook.  Of  the  models  known  by  the 
author,  VERT  (Venture  Evaluation  and  Review  Technique)  is  the  most 
versatile  and  is  used  in  the  sample  case  in  a  subsequent  section. 

d.  A  major  shortcoming  of  most  risk  models  is  the  limited  number 
of  distributions  built  into  the  basic  program  and/or  the  amount  of  sub¬ 
jective  probabilistic  data  to  be  requested  from  the  "expert.11  Since 
many  activity  distributions  are  skewed  to  the  right,  i.e.,  the  possible 
range  of  an  overrun  exceeds  that  of  an  underrun,  the  standard  normal 
distribution  is  not  appropriate .  Also,  the  frequently  used  triangular 
distribution  can  easily  misrepresent  the  probability  densities  as 
shown  by  an  example  in  Appendixes  A  and  B.  The  VERT  program  allows  the 
analyst  a  choice  of  over  a  dozen  density  functions.  These  can  be 

used  to  describe  activities  in  terms  of  time  and/or  cost* 

e.  For  a  program  with  many  activities,  the  Central  Limit  Theorem 
(CLT)  provides  an  unbiased  estimate  of  the  expected  mean  and  variance 
of  the  total  program  cost**  by  simply  adding  the  expected  values 

of  the  activity  means  and  variances  since  the  limiting  distribution 
of  additive  variables  (even  from  skewed  distributions)  is  normal. 
However,  for  a  few  skewed  activities,  or  domination  by  a  few  skewed 
activities,  or  intermediate  milestone  distributions  based  upon  a  small 
number  of  skewed  activities,  application  of  the  CLT  can  give  misleading 
results  regarding  variance  and  skewness. 


*  Activity  is  defined  as  the  time  or  cost  to  complete  a  task  whereas 
an  event  is  a  point  in  time,  e.g.,  start  of  flight  testing. 

**  Estimates  of  program  time  or  cost  as  a  function  of  time  generally 
cannot  use  the  CLT.  While  the  expected  mean  for  time  can  be  computed 
along  the  critical  path,  the  distribution  for  time  or  cost  for 
several  activities  requires  more  sophisticated  techniques,  e.g*, 
s imula  tion . 


60 


SELECTION  OF  D I S TR I  BUT I ON ( S )  : 

a.  When  the  analyst  must  determine  the  activity  distributions 
from  subjective  Inputs,  the  analyst  usually  can  only  zero  in  on  the 
general  shape  of  the  distribution  and  its  associated  parameters. 
Consequently  the  following  criteria  was  used  to  select  a  distribu¬ 
tion  or  distributions: 

1 .  Simp  lie ity 

2.  Could  be  symmetric,  skewed  left,  or  skewed  right 

3.  Could  have  varying  degrees  of  kurtosis,  i.e.,  concentration 
around  mean  or  mode 

4.  Could  be  normalized  for  computer  simulation 

b.  During  the  1960's,  several  theoretical  papers  (See  References 
3  thru  8)  where  written  regarding  cost  uncertainty*  All  of  the 
authors  of  these  papers  chose  the  beta  probability  function  to 
describe  activities  because  of  Its  versatility  and  simplicity* 

Because  of  the  large  amount  of  computer  core  and  central  processing 
unit  (CPU)  time  required  to  run  a  Monte  Carlo  simulation,  most  of 
these  earlier  authors  advocated  the  convolution  of  beta  distributions 
by  the  method  of  moments.  This  method  provides  a  total  program  cost 
distribution  profile  but  suffers  from  the  same  shortcomings  as  using 
the  Central  Limit  Theorem  discussed  earlier.  During  the  decade  of 
the  1970's,  little  use  has  been  made  of  this  research.  However,  with 
todays  high  speed  computers,  a  complex  network  can  be  simulated  via 
Monte  Carlo  techniques  much  more  efficiently.  (Using  the  VERT  program, 
a  complex  network  can  be  simulated  1000  times  with  under  240K  core  and 
under  2  minutes  CPU  time.) 

c*  This  author  evaluated  several  distributions  (triangular, 
gamma,  weibull,  beta,  et  al*)  and  reached  the  same  conclusion 
that  the  beta  function  could  adequately  describe  most  activity  dis¬ 
tributions  and  was  generally  superior  to  other  probability  functions 
vis-a-vis  the  criteria  listed  above.  The  preceeding  statement  is 
not  meant  to  suggest  the  exclusive  use  of  the  beta  distribution  when 
acquiring  subjective  inputs.  There  are  situations  where  other  distri¬ 
butions  maybe  more  appropriate,  e.g.,  the  Poisson  distribution  for  the 
expected  life  of  a  component  or  the  binomial  distribution  for  either/or 
situations . 

d  *  The  beta  probability  density  function  (pdf)  Is: 


f  (x) 


r(a  4-  b) 

r(a)r(b) 


1)  (b 

(1  -  X) 


1) 

,  where  0  <  x  <  1 


The  parameters  "a"  and  "bM  determine  the  degree  of  skewness  and  kurtosis. 
The  following  transformation  of  actual  high  (H)  and  low  (L)  points  of 
the  range  conform  to  the  beta  pdf  range  of  0  thru  1- 

x  =  (X-L)/(H-L)  ,  where  X  is  the  actual  data  value. 

The  computer  program,  given  the  beta  pdf  parameters,  randomly  selects 
x  and  then  transforms  It  to  X  for  each  iteration  through  the  network. 
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e.  When  obtaining  subjective  probabilistic  information,  the  author 
has  experienced  the  best  results  when  the  choices  available  to  the 
experts  have  the  following  characteristics: 

1.  Finite  end  points  which  exclude  extremely  unlikely  probabilities 

2-  Unimodal 

3.  Continuous  rather  than  discrete 

4*  Few  input  parameters  required 

5,  A  finite  set,  with  visual  illustrations,  from  which  to  choose 
The  beta  distribution  also  met  the  first  four  elements  of  this  criteria# 
To  meet  criteria  five,  nine  representative  beta  distributions  were  sel¬ 
ected.  The  first  four  are  skewed  to  the  right  with  modes  25%  and  40% 
of  the  way  through  the  range.  Distributions  five  through  seven  are 
symmetric  and  distributions  eight  and  nine  slightly  skewed  to  the  left 
with  the  modes  60%  of  the  way  through  the  range.  These  nine  distribu¬ 
tions  are  displayed  in  Appendix  C. 

INPUT  REQUIRED: 

a.  To  determine  program  or  subprogram  uncertainties,  the  activities 
and  events  must  be  defined  and  their  interrelationships  established. 

This  is  best  depicted  in  a  PERT  type  network.  Although  it  is  not  the 
purpose  of  this  paper  to  describe  how  to  construct  this  network,  the 
following  general  comments  indicate  the  flexibility  available. 

1.  Branching  probability  paths  can  be  constructed,  e.g.,  probabil¬ 
ities  of  failure  causing  program  stop,  sufficient  problems  to  cause  major 
redesign,  or  adequate  success  to  continue  work  as  originally  planned. 

This  branching  may  be  activated  by  cost  and/or  time  constraints. 

2.  Activities  can  be  described  in  terms  of  time  or  cost  risk.  Care 
should  be  taken  to  include  the  Interdependancies  of  events.  Activities 
may  have  to  be  subdivided  for  this  purpose,  e.g.,  design  of  item  A  into 
preliminary  design  and  final  design  of  A  because  the  prelimary  design 

of  A  is  required  before  item  B  can  be  designed. 

3*  Time  uncertainty  usually  assumes  a  normal  work  pace  (e.g.,  a  40 
hour  work  week).  Analysis  can  then  determine  critical  activities  which 
allows  management  the  option  of  selected  overtime  or  reallocation  of 
resources  and  awareness  of  which  ac t ivit ies /event s  to  monitor  closely* 

4.  Cost  is  frequently  determined  as  a  linear  function  of  time, 
i.e.,  cost  =  a  4-  bx,  where  a  is  a  base  constant  and  b  is  a  cost  per 
unit  of  time.  This  is  based  on  the  close  relationship  between  cost 
and  time  where  time  is  a  function  of  technical  uncertainty* 

5.  Ac t iv I t Ies / event s  become  more  specific  as  a  program  is  defined 
in  more  detail.  E.g.,  to  monitor  the  risk  in  an  ongoing  program,  the 
ac t ivit ies /event s  for  the  next  6  months  are  In  more  detail  than  those 
farther  in  the  future  whereas  past  activities  are  now  given  a  fixed 
number . 

6*  A  well  constructed  program  network  may  combine  elements  of  all 
the  above. 

The  inputs  required  for  risk  analysis  can  be  readily  seen  from  the 
developed  network.  The  suggested  procedure  which  follows  assumes  act¬ 
ivity  estimates  cannot  be  obtained  by  traditional  parametric 
statistical  relationships. 
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b*  Parameters  required  to  describe  an  activity's  uncertainty,  using 
the  beta  distribution,  are:  the  lower  and  upper  bounds,  the  most  likely 
value,  and  a  choice  of  one  of  the  nine  beta  distributions  shown  in 
Appendix  C*  Note  that  there  is  a  redundancy  between  the  most  likely 
value  and  the  beta  distribution  selected*  This  redundancy  provides  a 
check  on  the  consistency  of  the  information  provided* 

1.  The  high  (H)  or  pessimistic  bound  assumes  significant  aspects 
of  the  activity  develop  problems  but  excludes  extremely  unlikely  or 
catastrophic  occurrences  such  as  a  tornado  destroying  a  prototype  or 
a  national  transportation  strike*  There  should  be  little  chance  of 
exceeding  this  bound  -  a  workable  guideline  is  no  more  than  one  chance 
in  a  hundred* 

2*  The  low  (L)  or  optimistic  bound  is  defined  similarly  to  the 
high  estimate,  except  the  most  favorable  conditions  exist* 

3*  The  most  likely  value  or  mode  (M)  is  that  estimate  which  has 
the  greatest  possibility  of  occurring. 

4*  Unless  the  distribution  Is  symmetric  around  the  mean,  the  mode 
is  different  from  the  mean-  The  above  terms  are  Illustrated  by  a 
hypothetical  example  in  Appendix  D* 

DATA  COLLECTION: 

a.  Unless  the  person  providing  the  information  has  experience 
in  this  method  of  estimating,  the  personal  interview  method  is  pre¬ 
ferred.  Although  it  may  require  more  time  and  money,  the  analyst 
has  more  confidence  In  the  reliability  of  the  inputs*  When  two  or 
more  estimators  are  available,  the  Delphi  technique  may  be  used* 

Other  data  collection  techniques  are  discussed  in  DARCOM  Handbook 
11-1.1-79,  Reference  2* 

b.  Some  general  points  the  interviewer  should  consider  are: 

1.  Re(s)  must  understand  and  be  able  to  describe  the  program, 
scope  of  work,  and  the  network  in  adequate  detail  to  answer  questions 
by  the  estimator  and  to  ask  the  right  questions* 

2.  Allow  sufficient  time  for  the  Interview-  Try  to  pick  a 
setting  which  minimizes  interruptions. 

3-  The  mode  is  the  point  most  likely,  i.e.,  the  point  with  the 
most  chance  of  being  correct*  It  may  not  be  the  mean  or  expected 
value-  To  assist  the  interviewer,  the  modes  and  means  are  given 
in  Appendix  C  with  the  nine  beta  distributions.  Also  given  are 
the  areas  under  the  decile  and  quartile  tails  of  the  distributions. 

4.  The  low  and  high  points  of  the  range  should  be  reasonable. 

This  Includes  the  possibility  that  many  events  could  be  favorable 
or  unfavorable  but  nothing  catastrophic  would  happen. 

5-  Because  of  the  redundancy  in  input,  the  Interviewer  can 
quickly  check  for  consistency.  However,  an  atmosphere  of  coopera¬ 
tion  should  be  promoted  to  minimize  defensive  reactions-  Also, 
the  interviewer  should  be  careful  to  not  introduce  bias  Into  the 
the  estimates  received. 

6.  The  interviewer  should  remain  alert  to  the  estimator's 
understanding  of  the  process  and  his  knowledge  of  what  Is  being 
estimated . 

7.  As  a  result  of  additional  data  acquired,  the  program  network 
may  need  to  be  revised. 
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SAMPLE  CASE: 


a.  Situation:  A  missile  system  is  to  be  developed  using  an 

existing  proven  system  as  the  base.  The  only  major  change  will  be 

in  the  guidance  subsystem.  The  system  is  composed  of  five  subsystems 
airframe  (A),  propulsion  (P),  guidance  (G),  peculiar  ground  support 
equipment  (PG)  and  common  ground  support  equipment  (CG).  No  major 
problems  in  subsystem  interfaces  is  expected.  The  first  four  sub¬ 
systems  will  be  designed,  fabricated  or  modified  (DFM) .  These  four 
subsystems  will  then  each  be  component  tested  and  fixed  (CTF)  as 
necessary.  Meanwhile  CG  will  be  acquired  (ACG) .  Next  all  subsystems 
will  be  integrated  and  fixed  (IF)  as  necessary,  followed  with  a 
complete  system  test  (ST). 

b.  The  above  relationships  are  shown  in  Exhibit  l.  Event  names 
follow  the  abbreviations  above.  Figures  below  each  line  indicate  the 
beta  type  distribution  plus  the  low,  mode  and  high  cost  estimates. 
E.g.,  for  the  activity  CTFG 


[gT 

CTFG 

( - 

_ 1  v  p 

LU 

B  2 :  100,  1  25  ,  200 

- j  Au 

L  _ _ 

the  guidance  subsystem  is  component  tested  and  fixed  as  necessary. 

The  cost  uncertainty  is  described  by  the  beta  type  2  distribution 
with  a  range  from  100  to  200  and  a  mode  of  125.  Exhibit  2  portrays 
the  same  data  in  tabular  form. 

c .  Analy s is : 

1*  Although  the  mode  is  that  value  which  occures  most  often,  it 
is  not  the  expected  value  for  an  activity.  (Reference  example  in 
Appendix  D.)  In  the  Sample  Case,  the  point  estimate  for  the  total 
program,  determined  by  adding  the  modes,  is  1695  whereas  the  sum  of 
the  activity  expected  values  is  1775.  The  mode  method  underestimates 
the  costs  by  80  units  or  5  percent.  The  mode  method  typically  under¬ 
estimates  a  program's  cost  or  time  when  the  program  component 
activities  are  skewed  to  the  right,  i.e.,  the  range  of  an  overrun 
exceeds  that  of  an  underrun.  The  program  total  mean  value,  as  deter¬ 
mined  by  simulation,  will  normally  not  equal  the  expected  mean  value 
because  of  the  random  selection  of  activity  values  during  simulation; 
however,  the  two  methods  should  have  totals  within  +  1  percent. 
Although  the  point  estimate  as  determined  by  the  expected  value  or  by 
simulation  is  superior  to  the  mode  method,  the  decision  maker  still 
has  no  quantification  of  the  uncertainties  about  the  point  estimate, 
i.e.,  some  measure  of  how  much  the  estimate  could  be  in  error. 

2.  Simulating  the  Sample  Case  network  by  Monte  Carlo  techniques 
provides  a  convolution  of  the  activity  probability  distributions  and 
provides  a  measure  of  the  uncertainty  around  the  point  estimate. 
Exhibit  3  displays  probabilities  and  costs  for  selected  events. 

E.g.,  using  the  beta  distribution,  there  is  a  75  percent  chance  that 
the  cost  of  the  program  will  equal  or  be  less  than  1831  units  or  a 
25  percent  chance  that  the  cost  will  exceed  1831  units.  Exhibits  4.1 
thru  4.3  display  the  VERT  output  for  the  Sample  Case  for  the  same 
events  summarized  on  Exhibit  3.  Using  the  VERT  model,  output  can  be 
generated  at  any  event. 
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SAMPLE  CASE  NETWORK 
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EXHIBIT 


SAMPLE  CASE  DATA 


Beta  Parameters 


Ac  t  tv  i  t  y 

a 

b 

L 

M 

H 

r  ,* 

E  [x] 

„  ** 

Sim 

DFMA 

3. 

4. 

160. 

200  . 

260  . 

203  . 

203. 

DFMP 

2. 

2.5 

220  . 

300. 

420. 

309  . 

301  . 

DFMG 

2. 

4. 

400. 

500. 

800. 

5  33  . 

536  . 

DFMPG 

2. 

4  . 

40. 

50. 

80. 

53. 

53. 

ACG 

3. 

3. 

40. 

50. 

60. 

50. 

50. 

CTF  A 

3. 

4  . 

16  . 

20. 

26. 

20. 

20. 

CTFP 

3. 

4. 

60. 

80. 

110. 

81  . 

82  . 

CTFG 

2. 

4. 

100  . 

125. 

200. 

133. 

132  . 

CTFPG 

2. 

2  .  5 

16. 

20. 

26. 

20. 

20. 

Sub  total 

at  event  "J" 

1  052  . 

1  345  . 

1982  . 

1402  . 

1  397  . 

IF 

2. 

4. 

150  . 

200. 

350. 

217. 

215. 

ST 

2. 

2.5 

100. 

150. 

225  . 

156. 

150  . 

TOTAL  PROGRAM 

1302  . 

1695  . 

2557  . 

17  75  . 

1762  . 

*  Expected  value  for  the  normalized  beta  distribution  is  a/(a+b), 
which  is  converted  by  multiplying  by  the  range  and  adding  the  lower 
limit* 

**  Activity  mean  values  resulting  from  simulating  the  activities  by 
1000  iterations  through  the  network  using  the  VERT  model. 

EXHIBIT  2 
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SAMPLE  CASE 


Probability  Points  for  Convoluted  Distributions 


Event  * 

Probabil  i  ty 

Beta  Dis t . 

Tr i a ngular 

XG 

.  10 

576 

598 

.25 

610 

638 

mean 

668 

705 

.75 

719 

763 

.  90 

778 

837 

J  .  10 

1285 

1331 

.25 

1  330 

1385 

mean 

1  397 

1457 

.75 

1459 

1525 

.90 

1524 

1595 

FINISH 

.  10 

1632 

1712 

.25 

1689 

1764 

mean 

1762 

1848 

.75 

183  1 

1927 

.90 

1899 

1994 

Dis  t  -  ** 


*  Costs  are  for  all  activities  leading  to  the  event. 

XG :  Completion  of  guidance  subsystem  prior  to  integration 
with  other  subsystems. 

J:  Cost  of  all  subsystems  before  system  integration. 

FINISH:  Cost  of  total  program. 

**  The  triangular  distribution  results  assumed  the  same  range  and  mode  for 
each  activity  as  that  used  for  the  beta* 


EXHIBIT  3 
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CONCLUDING  REMARKS: 


The  desirability  and  even  the  necessity  for  quantifying  the  uncer¬ 
tainty  around  the  point  estimates  (time  or  cost)  for  new  or  ongoing 
programs  is  becoming  a  standard  procedure  within  the  Department  of  Army* 
The  professionalism  of  analysts  dictate  that  they  maintain  awareness  of 
new  and  revised  techniques.  With  the  increasing  availability  of  effici¬ 
ent  and  fast  computer  equipment,  former  analytical  methods  can  now  be 
performed  economically*  Although  the  method  proposed  In  this  paper  is 
not  new,  its  use  has  been  limited  by  unawareness,  availability  of 
computer  models  with  random  number  generators  for  many  probability 
distributions,  and  limited  computer  capability.  The  later  reasons 
are  no  longer  true  and  a  major  purpose  of  this  paper  Is  to  promote 
more  widespread  awareness  of  the  capabilities  available  to  analysts 
and  decision  makers. 

Although  this  method  does  not  reduce  the  amount  of  uncertainty 
in  a  program,  it  does  attempt  to  quantify  them  in  a  more  precise  manner. 
Provided  with  this  additional  knowledge,  the  decision  maker  should  be 
able  to  make  better  decisions  and  allocations  of  our  available  resources. 
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BETA  VIS-A-VIS  TRIANGULAR  DISTRIBUTIONS 

SPECIFIC  EXAMPLE:  Compared  are  the  beta  type  2  distribution  and  a 

triangular  distribution,  both  with  the  same  range  and  mode-  For  this 
case,  as  shown  on  the  graph  and  chart  below,  the  triangular  distribu¬ 
tion  has  significantly  less  area  In  the  low  rang*  and  more  In  the 
high  range*  Also,  the  expected  value  or  mean  of  the  beta  and  triangu- 


vill  change  with  different  shaped  beta  distributions  *  Whereas  both 
distributions  include  the  parameters  of  range  and  mode,  the  beta 
parameters  include  a  shape  parameter  which  allows  greater  discretion 
in  describing  the  uncertainty  in  an  activity.  However,  under  certain 
conditions,  the  triangular  distribution  maybe  as  accurate  as  experi¬ 
ence  will  justify* 


APPENDIX  A 


BETA  UIS-A-UIS  TRIANGULAR 
BETA  (2.00,4.00),  MODE *0.25 
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BETA  UIS-A-VIS  TRIANGULAR 
BETA  (3.00,2*50),  flODE-0.40 


BETA  UIS-A-VIS  TRIANGULAR 
BETA  (3.00,4*00),  MODE-0.40 


BETA  UIS-A-VIS  TRIANGULAR 
beta  (a.ee.2.00),  mode-0.50 


BETA  UIS-A-UIS  TRIANGULAR 
BETA  0.00,3.00),  NODE-0.50 
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BETA  UIS-A-VIS  TRIANGULAR 
BETA  (2.50,2.0®),  MODE* 0.60 
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Parameters  and  Data  of  Beta  Distributions 
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HYPOTHETICAL  EXAMPLE 


Mode  vifl-a-vis  Mean 

PROBLEM:  How  much  should  you  pay  the  neighbor  boy  for  mowing  your  yard? 

SITUATION:  The  price.  Is  normally  a  fixed  price  arrangement-  You 

consider  $2.  an  hour  a  fair  price  and  most  of  the  time  It  takes  2 
hours  to  do  the  job.  Many  times  there  Is  little  rain  and  the 
resulting  shorter  grass  can  be  mowed  more  quickly.  However,  If 
there  is  some  wind,  small  branches  fall  on  the  lawn  and  the  boy 
must  pick  them  up  before  he  mows*  Occasionly,  the  wind  blows  down 
many  branches.  Extremely  rare  occurances  are  ignored,  e.g.,  extended 
draught  or  a  tornado.  You  estimate  the  job  will  take  from  1-5  to 
3-5  hours  with  the  most  likely  time  of  2-0  hours  and  a  distribution 
shaped  like  that  shown  below* 

CONCLUSION:  Since  the  distribution  is  a  beta  distribution  with  a/b 

parameters  of  2/4,  the  average  expected  time  is  2*17  hours  or  $4-34 
at  $2.  per  hour. 
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ON  THE  DISTRIBUTION  OF  A  LINEAR  COMBINATION 
OF  MULTINOMIAL  VARIABLES 

John  C.  Conlon 

U  S  Army  Materiel  Systems  Analysis  Activity 
Aberdeen  Proving  Ground,  Maryland  21005 


ABSTRACT .  A  user-provided  subjective  comparison  of  the  quality  of  a 
service  or  product  as  furnished  by  two  different  agents  is  generally  done  by 
having  a  sample  of  users  rate  one  or  other  of  the  agents  as  better.  If  the 
same  users  are  not  exposed  to  the  service  provided  by  both  agents,  a  differ¬ 
ent  technique  of  evaluating  the  agents  is  required.  In  this  paper,  a  technique 
of  having  some  users  rate  one  agent  and  different  users  rate  the  other  on  a 
scale  of  0  to  K  is  proposed.  A  test  of  hypothesis  is  given,  the  distribution 
of  the  test  statistic  is  obtained  by  simulation.  The  program  is  listed  and 
some  sample  output  is  included. 

1.  INTRODUCTION.  In  this  report  we  describe  a  technique  for  evaluating 
a  subjective  rating  of  a  product  or  service.  A  typical  situation  might  be 
when  two  performing  agents  are  manufacturing  a  product  or  providing  a  service. 
Each  agent  is  being  rated  subjectively  as  to  the  quality  of  the  product  or 
service.  We  develop  a  parameter  useful  in  comparing  the  two  agents  and  a 
statistic  for  testing  the  difference.  The  statistic  is  asymptotically  normal, 
but  we  needed  to  determine  its  distribution  for  small  sample  sizes.  We 
accomplished  this  using  a  Monte  Carlo  simulation. 

In  Section  2  we  describe  the  problem  with  the  inherent  difficulties  in 
solving  it  analytically.  We  then  detail  how  we  simulate  the  distribution  of 
the  test  statistic.  In  Section  3  we  document  the  functions  of  the  program  and 
its  subroutines.  In  the  Appendix,  we  give  a  listing  of  the  program,  input 
requirements,  and  sample  output.  We  also  include  a  graph  of  the  distribution 
showing  how  it  approaches  normality  as  sample  sizes  are  increased. 

2.  DESCRIPTION  OF  THE  PROBLEM  AND  THE  SIMULATION.  Whenever  a  subjective 
rating  of  the  quality  of  some  product/service  is  performed  it  usually  assumes 
the  following  form.  Subjects  are  requested  to  rate  on  a  scale  from  zero  to 
"k"  the  quality  of  the  product  or  service.  The  number  "k"  can  be  any  integer 
larger  than  zero  with  the  relationship  that  the  larger  the  value  of  k,  the 
finer  is  the  delineation  desired.  When  a  number,  say  n,  of  subjects  rate  the 
product/service,  then  the  numbers  of  observations  in  each  of  the  k  +  1  catego¬ 
ries  represent  a  sample  from  a  (k  +  1)  -  dimensional  multinomial  distribution. 

The  situation  often  arises  when  two  performing  agents  are  providing  the 
product/service  and  the  analyst  desires  to  determine  which  one  is  performing 
better.  In  some  cases  it  may  be  impossible  to  have  a  subject  choose  which 
he  thinks  to  be  the  better  of  the  two  after  having  tested  each  one.  This  may 
occur  when  the  performing  agents  are  providing  the  product/service  at  two 
different  times  and  the  same  subjects  are  not  available.  In  this  case  it 
seems  reasonable  to  have  n  subjects  rate  performing  agent  A  and  m  other 
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subjects  rate  performing  agent  B.  This  provides  the  analyst  with  independent 
samples  of  size  n  and  m  from  two  multinomial  distributions.  We  may  then 
test  for  the  equivalence  of  two  multinomial  distributions.  There  are  some 
well  established  procedures  for  accomplishing  this.  For  our  situation  this 
may  not  be  entirely  appropriate,  since  to  test  the  equivalence  of  two  multi¬ 
nomial  distributions  is  actually  to  test  the  equivalence  of  the  probabilities 
of  being  in  each  category  for  the  two  distributions.  As  an  alternative  to  the 

k 

classical  approach,  consider  the  parameter,  £  a.p.,  where  p.  is  the  probability 

j=0  ^ 

of  an  observation  being  in  the  jth  category  and  a.  is  a  weighting  factor 

J 

proportional  to  the  desirability  of  being  in  category  j.  We  require  for  conven¬ 
ience  that  ag  be  zero  and  that  all  weights  be  integer  values.  A  reasonable 

test  to  determine  the  better  of  the  two  performing  agents  is  to  test  the 

hypothesis,  l  a.(p.-p*)  =  0. 
j=0  0  J  J 


The  statistic  chosen  for  testing  the  above  hypothesis  is  £  a.(MN.-NM.), 

j — 0  J  J  J 

where  N.  and  M.  are  the  number  of  responses  in  the  jth  category,  respectively, 

J  J 

for  each  distribution.  This  statistic  was  chosen  since  it  always  yields  an 
integer  value.  We  denote  this  statistic  Q  and  study  the  distribution  of  Q. 

The  probability  that  Q  does  not  exceed  x,  Fg(x),  is 


E  £ 


u<x{m,n:  J  a.(Mn.-Nm.)  =  u) 
~  ~  j  -  Q  J  J  J 


/  N  \/  M  \  k  n .  m . 

VV  V  •••*  nk/\Vmr  mk/j"Qp j  Jpj  ■ 


While  this  is  an  exact  expression  for  the  distribution  function  Fn,  it  is  of 

k  y 

little  value  in  constructing  a  test  of  the  null  hypothesis,  £  a.(p.-pi)  =  0. 

j=0  0  0  J 

Even  when  the  null  hypothesis  is  true,  it  is  impossible  to  compute  percentage 
points,  the  reason  being  that  for  any  set  of  parameters,  pQ,  p1 ,  ...,  p^,  there 

k 

are  an  uncountable  number  of  values  for  pi,  p*,  p*  such  that  Y  a.(p.-pi)  =  0. 

u  1  K  j=Q  J  J  J 


Since  the  random  vectors  N  and  M  are  asymptotically  multivariate  normal,  the 
statistic  Q  is  also  asymptotically  normal.  When  the  null  hypothesis  is  true,  the 
mean  of  Q  is  zero  and  the  variance  of  Q  is 


k 

In  order  to  construct  an  approximate  test  of  the  hypothesis,  T  a.(p.-pj)  c  0, 

using  the  normal  distribution  we  must  use  the  estimates  Nj/N  and  Mj/M  for  p^ 

and  p*,  respectively,  j  =0,  1,  ....  k.  Since  this  is  an  asymptotic  result, 

J 

a  large  sample  size  is  required  (N,  M  >  25  is  recommended). 

The  problem  remains  as  to  how  to  handle  the  small  sample  situation.  We 
decided  to  simulate  the  distribution  of  Q  for  small  sample  sizes.  The  simu¬ 
lation  involves  the  following  steps  (which  are  explained  in  more  detail  later): 

•  Randomly  generate  a  set  of  parameters  Pq,  p-| ,  ....  p^.. 

•  Deterministically  construct  sets  of  parameters  p^,  p^,  ....  p£  such 

k 

that  in  each  instance  I  a.(p.-pt)  =  0,  or  any  real  number. 

j=0  J  J  J 

•  For  each  set  of  parameters,  one  hundred  random  samples  of  size  N(M) 
are  drawn  from  the  associated  multinomial  distribution. 


•  The  value  of  the  statistic  Q  is  computed  and  recorded. 

Each  of  the  four  steps  above  is  repeated  one  thousand  times  with  the  result 

k-1 

that  approximately  100,000(2  -1)  values  of  Q  are  used  in  the  tabulation  of 

the  distribution  of  Q.  Let  us  examine  now  in  detail  the  steps  enumerated  above. 


The  parameters  pQ,  p^ ,  ....  p^  are  generated  according  to  the  density 

k  1  k 

f(p.,p„,  ...,p.}  =  k!  I  (p.:0<  l  p,<l),  with  pn  =  1  -  Ip..  This  distribution 
1  d  .  K  ‘j«l  J"  U  j-1  3 

k 

was  chosen  as  being  uniform  on  the  hyperplane  defined  by  (x:  0  <  l  x,.<l).  To 

"  i=i  1- 

generate  the  values  for  pQ,  p^ ,  .  ...p^,,  we  use  the  following  procedure.  The 
marginal  density  of  p^  is  given  by 

f (x)  =  k(l-x)k_1 , 


so  that  the  cumulative  distribution  function  for  p^  is  given  by 

F (x)  =  l-(l-x)k. 

The  conditional  cumulative  distribution  of  p^  given  p-j,P2» 

k-i 

1 "[  (1  "Pi~P2  ""  •••  ~P.j  )/(l  ~P-|  _P2  *■*  -  P  ^  _  i )  3 


]The  function  1(A)  is  the  characteri stic  function  on  the  set  A. 
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To  generate  p.  after  having  generated  P-|»P2»  * •  • » P-j _i  we  generate  an  observation 

from  the  uniform  distribution  on  the  interval  (0,  1).  Let  us  denote  this 
observation  u.  The  value  for  p.  is  then 

1  1/k-i 

(l-p]-p2-  ...  -pj_i )(l-(l-u)  ). 

After  the  values  for  p.j ,  p2,  ...  pk  have  been  generated,  we  set  pQ  to  the 
k  k 

quantity  1-  l  p.,  so  that  l  p.  =  1 . 
i=l  1  i=0  1 

After  a  set  of  values  for  p^,  p-j ,  ...,  p^  has  been  generated,  the  next 

k 

step  is  to  construct  values  for  pi,  pi,  ...,pi  such  that  l  a.(p.-pi)  =  0,  or 

u  .  k  i=o  '  1  1 

any  real  number.  We  chose  to  select  the  pi's  deterministically  rather  than  by 

a  random  selection  process.  Our  thought  was  to  select  sets  of  values  for  the 
pi's  which  represent  a  wide  range  of  possibilities.  To  this  end  we  chose  to 

select  one  set  for  each  of  the  following  situations:  one  non-zero  pi,  two 

non-zero  pi's,  etc.  For  the  case  involving  two  or  more  non-zero  pi's  we 

set  the  constraint  that  a. pi  =  a, pi  for  each  i,j  such  that  p.,p.  are  non-zero. 

t  I  J  J  *  J 

This  is  an  artificial  constraint  chosen  solely  for  ease  of  programming.  All 
possible  combinations  of  each  number  of  non-zero  values  are  attempted  although 
some  combinations  could  result  in  the  sum  of  the  pi's  greater  than  one.  When 

this  occurs,  the  values  are  discarded  and  the  next  set  of  pi's  is  constructed. 

After  a  single  set  of  values  for  pQ,  p^ ,  •••»P|<  and  pg,  p|,  ...,  p£  have 

been  established,  we  then  select  one  hundred  samples  each  of  size  N  and  M, 
respectively,  from  each  multinomial  distribution.  For  each  pair  of  samples 
the  statistic  Q  is  computed  and  recorded.  The  results  are  tabulated  and  dis¬ 
played  as  a  discrete  distribution  function  in  140  steps.  The  length  of  the 
steps  is  a  function  of  the  weights  and  the  sample  sizes. 


3.  INSCRIPTION  OF  THE  PROGRAM.  The  main  program  receives  the  input 
requirements,  directs  the  development  of  the  p^'s  and  the  p*'s,  and  prepares 

the  results  for  writing  on  a  permanent  file  and  the  output  file.  We  chose  to 
have  the  data  written  onto  a  permanent  file  so  that  the  distribution  can  be 
reaccessed  when  required  and  plotted,  if  desired. 

Subroutine  WRITE  prints  the  data  onto  the  output  file. 

Subroutine  PROB  is  responsible  for  the  generation  of  the  p. 's.  The  intrinsic 

function,  RANF,  of  course,  generates  observations  from  the  uniform  distribution 
on  the  interval  (Q,  1).  Each  successive  value  for  p  beginning  with  p^  is 

computed  using  the  transformation,  based  on  the  conditional  distribution  of  p. 
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given  P|+-| »  P-j+2>  Pp, *  given  in  Section  2.  The  values  for  p1 ,  p2,  .  ..,pk 


are  generated  randomly  whereas  Pq 
's  is  one. 


k 

is  set  at  1-  £  p., 
i=l  1 


so  that  the  sum  of  the 


Subroutine  FANCY  is  responsible  for  generating  the  sets  of  pt's  and  for 

directing  the  selection  of  observations  from  the  appropriate  multinomial 
distributions.  For  each  set  of  values  of  p^,  p^,  ...,  pj£  corresponding  to 

values  of  Pq»  P-j  .  . ...  Pk,  one  hundred  samples  each  of  sizes  N  and  M  are 

drawn.  The  statistic  Q  is  calculated  for  each  sample  drawn  and  the  value 
is  sent  to  subroutine  COMP  which  records  the  value  and  stores  it.  Within 
the  subroutine  there  is  a  method  for  computing  all  the  values  for 
Pg>  P|>  •--*  P£  in  two  stages.  Firstly,  we  determine  which  pt's  will  be 

non-zero.  Every  non-empty  subset  of  (p|,  p£,  ....  p*}  is  tried.  Secondly, 

k  k 

we  choose  the  non-zero  pt's  so  that  l  a.p.  -  l  a.p*  =  0,  or  any  real  number, 

j=0  J  J  j-0  J  J 

and  a.pt  =  a.pt  for  all  i,j  such  that  pt,  p*  f  0.  Once  it  is  determined  that 
k  i  i  J  J  i  J  k 

\  p*  <  1,  the  parameter  pi  is  set  equal  to  1-  £  pi.  At  this  point  the  sub- 

i=l  1  "  u  i=l  1 

routine  directs  the  sampling  of  observations  from  the  appropriate  multinomial 
distributions.  Subroutine  MULT  is  responsible  for  producing  a  single  observa¬ 
tion  from  a  multinomial  distribution.  Subroutine  FANCY  calls  this  routine, 
receives  the  observations,  and  records  it.  Following  this,  subroutine  FANCY 
computes  the  value  of  the  statistic  Q  and  sends  it  to  subroutine  COMP  which 
records  the  value  and  stores  the  accumulation  of  values  for  final  output. 

As  soon  as  all  the  possible  values  for  p^,  p|,  ....  p£  within  the  framework 

specified  have  been  exhausted,  control  is  transferred  back  to  the  main  program 
for  generation  of  a  new  set  of  parameters  pQ,  p^ ,  ...»  p^. 

Subroutine  MULT  generates  a  single  observation  from  a  multinomial  distribu¬ 
tion  with  parameters  pQ,  p^ ,  ...,  p^.  The  interval  (0,1)  is  partitioned  into 

the  subintervals  (0,pQ),  (Pq»P0+P-])»  •••>  ( P0+Pi  +  •••  +  Pk_-|  >  1 ) •  Suppose  we 

label  these  subintervals  0,  1,  k.  A  random  number  is  generated  using  the 

intrinsic  function  RANF.  If  the  number  generated  is  contained  in  the  interval 
(Pq+P-|  +  ...  +  P-j_-|S  Pq  +  Pi  +  . . .  +  p.j ) ,  then  an  observation  for  category  i 

is  recorded  and  sent  back  to  subroutine  FANCY. 


Subroutine  COMP  receives  a  realization  of  the  statistic  Q  from  subroutine 
FANCY.  The  number  of  observations  of  the  statistic  Q  belonging  to  the  interval 
of  integer  values  of  which  Q  is  a  member  is  incremented  by  one.  This  subroutine 
then  keeps  the  accumulation  of  values  for  Q  which  will  become  the  distribution 
of  Q. 
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4.  PROGRAM  LISTING 


PROGRAM  MAIN (INPUT, OUTPUT, TAPES=INPUT,TAPE6=OUTPUT,TAPE8) 

COMMON  P(15),N(15),M(15),PX(15) ,LQ,NBA( 140) ,IVA(1 40) , JM( 1 5) ,AMN , 
,XBA( 1 40),IXB,XB,IU1  , IU2 , . KQ( 1 5) 

100  F0RMAT(4I10,F10.3) 

200  FORMAT ( 1 H , 140( 15 ,5X ,F10. 5/1 X ) ) 

300  FORMAT (1515) 

READ  (5,100)  IU1 ,IU2,IRN,1XB,XB 
READ  (5,300)  (KQ(MS) ,MS=1 ,15) 

CALL  RANSET (IRN) 

DO  9  MVT=1 ,140 
9  NBA(MVT)=0 

DO  11  JBN=1,1XB 
11  AA=AA+KQ(JBN) 

AMN-AA/ ( IXB-1 . ) 

DO  10  N0P=1 ,140 

T V= ( NOP- 7 1 . )*AMN*IUl*IU2/70.-IUl*XB 
10  IVA(NOP)=INT(TV) 

K=1 000 
DO  1  1=1, K 
CALL  PROB 
A=0. 

DO  2  J  =  1 , IXB 
LXR=KQ ( J j 
2  A=A+LXR*P(J) 

1  CALL  FANCY(A) 

AL=0. 

AA=0. 

DO  5  LJ=1 ,140 
5  AL=AL+NBA(LJ) 

XBA ( 1 )=NBA(1  )/AL 
DO  4  L=2, 1 40 
MMBA=L-1 

AMX=N  BA ( L ) /AL+XBA ( MMBA ) 

4  XBA(L)=AMX 

WRITE(8,200)  ( ( I VA ( I JU ) ,XBA( I JU) ) ,1 JU=1 ,140) 

CALL  WRITE 

STOP 

END 

SUBROUTINE  MULT(I,R,J) 

COMMON  P(15),N(15),M(15),PX(15) ,LQ,NBA(140) , I VA(  1 40 ) ,JM(15) ,AMN, 
,XBA( 1 40) , IXB ,XB , IU1 ,IU2,KQ(15) 

DIMENSION  R(1 5) 

A=0. 

B=RANF( I ) 

DO  1  K=1 , IXB 
A=A+R(K) 

IF(B.LE.A)  L=K 
IF(B.LE.A)  GO  TO  2 

1  CONTINUE 

2  J=L 
RETURN 
END 


90 


SUBROUTINE  WRITE 

COMMON  P(15),N(15),M(15),PX(15) ,LQ,NBA(140) ,IVA(140) ,JM(15) ,AMN, 
,XBA(140),IXB,XB,IU1 ,IU2,KQ(15) 

WRITE(6 ,100)  ( ( IVA( I ) ,XBA(I) ) ,1=1 ,140) 

100  F0RMAT(1H1 , 1 39 (21 HLESS  THAN  OR  EQUAL  TO, 

,15 ,5X ,F7 ,5/lX) ,21H  GREATER  THAN, 15. 5X ,F7,5) 

RETURN 

END 

SUBROUTINE  PROB 

COMMON  P(15) ,N(15) ,M(15),PX(15) ,LQ,NBA(140) ,IVA(140) ,JM(15) ,AMN, 
,XBA(1 40) ,IXB,XB. IU1 ,IU2,KQ(15) 

DO  3  J=1 ,1 5 
3  P(J)=0. 

A=0. 

KF=IXB-1 
DO  1  1=1, KF 
IX=IXB+1 -I 
AX=IXB-I 

RY-1  /AY 

X=(l -RANF(I) )**BX 
Y=1 * -X 

P(IX)=(1 . -A)*Y 
1  A=A+P( IX) 

P(1 )=1 . -A 

RETURN 

END 

SUBROUTINE  CHANGE(J) 

COMMON  P(15),N(15),M(15),PX(15) ,LQ,NBA(140) , IVA(1 40) , JM( 1 5) ,AMN, 
,XBA(140),IXB,XB,IU1 ,IU2,KQ(15) 

J=J-1 
L=JM( J ) 

L=L-1 
K1 = I XB- 1 
DO  3  I=J ,K1 

3  JM( I )=L+J-I 
DO  1  I J=J ,K1 
IX=K1+U-I J 
IZ=JM( IX) 

IF(IZ.GT.l)  LI =IX 
IF(IZ.GT.l)  GO  TO  4 

I  CONTINUE 

4  J =L  1 
RETURN 
END 

SUBROUTINE  COMP(K) 

COMMON  P(15),N(15),M(15),PX(15) ,LQ,NBA( 1 40) , IVA(140) , JM(1 5) ,AMN, 
,XBA(140),IXB,XB,IU1,IU2,KQ(15) 
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DO  1  1=1,139 

ZJ=(I-71.00)*AMN*IUl*IU2/70.-IUl*XB 
J= I NT ( Z J ) 

IF(K.LE.J)  NBA ( I ) =NBA (I ) +1 
IF(K.LE.J)  GO  TO  2 

1  CONTINUE 
NBA(140)=NBA(140)+1 

2  RETURN 

SUBROUTINE  FANCY(X) 

COMMON  P(1 5) ,N(15) ,M(15) ,PX(15) ,LQ,NBA(140) ,IVA(140) ,JM(15) ,AMN, 
,XBA(140) ,IXB,XB,IU1 ,IU2,KQ(15) 

LI  =2 
Kl-IXB-1 
DO  1  1=1  ,K1 
Il-Kl-I+2 

1  JM(I)=I1 
J2=K1 

GO  TO  2 

4  LI =JM( J2) 

IF  (J2.EQ.1 .AND.L] .EQ.l )  GO  TO  9 

2  GO  TO  12 

14  IF(Ll.LE.l)  CALL  CHANGE  (J2) 

IF(L1 .GT.l)  JM( J2)=JM( J2)-l 
GO  TO  4 
12  IN=0 

DO  8  14=1 ,K1 
IL1 =14+1 

I F(OM( 14) .GT.l )  IN=IN+1 
8  PX ( I LI )=0. 

B=0. 

DO  5  JK=2 ,IXB 
IXJ=JK-1 
JXA=JM( IXJ) 

CXQ=KQ( JXA) 

IF( JXA.GT. 1 )  PX(JXA)=(X+XB)/(CXQ*IN) 

IF(JXA.LE.I)  PX( JXA)=0. 

5  B=B+PX( JXA) 

IF(B.GT.l)  GO  TO  14 
PXOH.-B 

DO  7  I V=1 ,100 
DO  6  LMI  =  1  ,IXB 
M(LMI )-0 

6  N(LMI)=0 
IN=JV 

DO  11  J9=l  ,IU1 
CALL  MULT (IN, PX , J V )  • 

I  N=0  V 

IT  M(OV)=M( JV)+1 
IO=IV 

DO  15  K9=l , IU2 
CALL  MULT( IO,P,KV) 
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IQ=KV 

15  N(KV)=N(KV)+1 

NQA=0 
MQA=0 

DO  3  JA=2,IXB 
LM=KQ( JA) 

MQA=MQA+LM*M( JA) 

3  NQA=NQA+LM*N( JA) 

LQ=(NQA*IU1 )-(MQA*IU2) 
7  CALL  COMP(LQ) 

GO  TO  14 
9  RETURN 

END 
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5.  INPUT  REQUIREMENTS 


Input  requirements  for  the  program  consist  of  two  cards.  We  use  the  first 
card  to  input  the  following  information: 

a.  The  number  of  trials  for  each  multinomial  distribution, 

b.  The  number  of  categories, 

c.  A  random  number  generator  initializer,  and 

d.  A  value  for  calculating  the  power  of  the  test. 

We  use  the  second  card  for  assigning  weights  to  the  individual  categories. 

Card  1:  Format  (4110,  F10.3) 

•  Number  of  trials  for  the  multinomial  distribution  with  para¬ 
meter  vector  p, 

•  Number  of  trials  for  the  multinomial  distribution  with  para¬ 
meter  vector  p*, 

•  Random  number  generator  initializer.  (Any  integer  number  will 
suffice.  Use  of  the  same  integer  generates  a  duplicate  string  of  random  numbers.), 

•  The  number  of  categories  in  both  distributions  (this  number  must 
be  an  integer  between  3  and  15,  inclusive),  and 

•  A  real  number  (x)  to  be  used  for  power  calculations.  The 
program  generates  the  distribution  of  the  statistic  Q  when  the  parameters  have 
the  property 


k 

l  a^PfP*)  =  x. 
i=0  1  1  1 

Card  2:  Format  (1515).  Weights  are  given  as  integer  values  with  zero 
being  the  first  weight.  Only  weights  for  the  number  of  categories  specified 
on  the  first  card  are  necessary. 
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6.  SAMPLE  DISTRIBUTIONS 
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N  =  10  M  =  5 

3 

=  0 

Weights:  a  =  0,  a-j  = 

5  4  ^2  8  $ 

a3  =  9 

u 

P(Q  f  u) 

u 

P(Q  ;  u) 

u 

P(Q  <  u) 

-205 

.01368 

-67  • 

.22514 

70 

.79115 

-202 

.01368 

-64 

.24274 

73 

.79115 

-199 

.01572 

-61 

.24274 

76 

.80556 

-196 

.01572 

-58 

.26095 

79 

.80556 

-193 

.01792 

-55 

.27950 

82 

.81963 

-190 

.02032 

-52 

.27950 

85 

.83309 

-187 

.02032 

-49 

.29957 

88 

.83309 

-184 

.02295 

-46 

.29957 

90 

.84654 

-181 

.02295 

-44 

.31936 

93 

.84654 

-178 

.02610 

-41 

.31936 

96 

.85771 

-176 

.02610 

-38 

.34067 

99 

.85771 

-173 

.02910 

-35 

.36200 

102 

.86878 

-170 

.03271 

-32 

.36200 

105 

.87898 

-167 

.03271 

-29 

.38338 

108 

.87898 

-164 

.03626 

-26 

.38338 

111 

.88889 

-161 

.03626 

-23 

.40542 

114 

.88889 

-158 

.04105 

-20 

.42765 

117 

.89821 

-155 

.04616 

-17 

.42766 

120 

.90729 

-152 

.04616 

-14 

.45028 

123 

.90729 

-149 

.05172 

;  “11 

.45028 

126 

.91514 

-146 

.05172 

-8 

.47311 

129 

.91514 

-143 

.05725 

-5 

.49519 

132 

.92299 

-140 

.06357 

-2 

.49519 

134 

.92299 

-137 

.06357 

0 

.52051 

137 

.92996 

-134 

.07070 

2 

.52051 

140 

.93633 

-132 

.07070 

5 

.54211 

143 

.93633 

-129 

.07835 

8 

.54211 

146 

.94221 

-126 

.07835 

11 

.56437 

149 

.94221 

-123 

.08614 

14 

.56437 

152 

.94747 

-120 

.09494 

17 

.58516 

155 

.95222 

-117 

.09494 

20 

.60637 

158 

.95222 

-114 

.10463 

23 

.60637 

161 

.95721 

-111 

.10463 

26 

.62808 

164 

.95721 

-108 

.11538 

29 

.62808 

167 

.96149 

-105 

.12634 

32 

.64830 

170 

.96531 

-102 

.12634 

35 

.66750 

173 

.96531 

-99 

.13816 

38 

.66750 

176 

.96896 

-96 

.13816 

41 

.68812 

178 

.96896 

-93 

.15006 

44 

.68812 

181 

.97222 

-90 

.16397 

46 

.70784 

184 

.97222 

-88 

.16397 

49 

.70784 

187 

.97510 

-85 

.17757 

52 

.72618 

190 

.97783 

-82 

.17757 

55 

.74246 

193 

.97783 

-79 

.19305 

58 

.74246 

196 

.98017 

-76 

.19305 

61 

.75841 

199 

.98017 

-73 

.20857 

64 

.75841 

202 

1.00000 

-70 

.22514 

67 

.77492 
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ABSTRACT.  A  simulation  model  is  being  developed  at  the  Construction  Engineering 
Research  Laboratory  for  the  support  of  the  continuing  assessments  of  the  Army  Mili¬ 
tary  Construction  (MCA)  process  by  the  Corps  of  Engineers.  This  MCA  process  simula¬ 
tion  model  provides  a  functional  representation  and  statistical  measurements  of  sys¬ 
tem  response  at  both  the  local  and  global  levels,  and  is  intended  to  be  an  analytical 
tool  for  researchers  investigating  this  complex  process  and  for  management  in  under¬ 
standing  and  controlling  it.  All  MCA  projects  proposed  for  an  Army  Budget  Year  Pro¬ 
gram  are  processed  by  the  model  through  all  appropriate  performance  and  approval  lev¬ 
els  to  their  final  determination.  First  goals  of  the  model  development  have  related 
to  time  and  cost  level  determinations  for  performance  functions,  specifically  the 
planning,  programming,  design,  and  construction  of  military  facilities.  These 
assessments  are  ultimately  related  to  the  timeliness  and  economics  of  the  construc¬ 
tion  product. 

1.  INTRODUCTION.  The  Army  Military  Construction  (MCA)  process  is  a  term  used 
to  identify  the  web  of  interacting  procedures,  management  systems,  and 
requirements/regulations  which  bring  a  needed  construction  project  from  an  Army 
installation's  proposal  through  all  intervening  development  stages  to  the  turnover  of 
the  finished  facility.  ■* 

The  Corps  of  Engineers  (CE)  is  the  largest  construction  agency  in  the  world, 
processing  $10  billion  in  construction  for  Fiscal  Year  1982  (FY82)  alone.  The  mili¬ 
tary  construction  portion  of  this  budget  is  nearly  $1  billion  and  holds, consistently 
at  about  10%  of  the  CE  budget.  The  MCA  process  utilizes  this  substantial  budget  to 
provide  needed  military  facilities;  any  improvement  in  the  MCA  process  will  be 
economically  significant  if  it  impacts  the  quality,  timeliness,  or  price  of  the 
facilities  delivered. 


*Visitor  at  the  University  of  Illinois. 
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Analyzing  the  MCA  process  can  be  complicated  by  the  multidimensional  nature  of 
the  process  and  can  be  made  even  more  difficult  when  multipurpose  objectives  are 
included  in  the  assessment  task.  The  existing  MCA  process  reflects  the  past  evolu¬ 
tionary  needs  of  Corps  construction  management  and  the  hierarchical  organization 
which  implements  this  management.  This  results  in  back-and- forth  transfers  of 
responsibility  over  many  layers  of  authority.  Furthermore,  the  complexity  of  the 
processing  systems  and  the  number  of  projects  handled  make  it  difficult  to  develop  an 
analytical  grasp  of  the  entire  MCA  operations  network  and  the  problems  such  a  network 
can  sometimes  encounter.  A  convenient  method  of  visualizing  such  complex  processes 
is  through  network  representations.  J 

2.  MODELING  THE  MCA  PROCESS.  Although  network  representations  of  the  MCA  pro¬ 
cess  should  be  developed  at  a  level  of  detail  appropriate  to  the  requirements  of  the 
intended  analysis,  "real-world"  constraints  must  be  considered.  There  will  usually 
be  analytical  time  limitations,  network  size  limitations,  unavailability  of  func¬ 
tional  data  (data  requests  can  disrupt  field  operations),  and  the  hierarchical  orien¬ 
tation  of  the  analyst.  Even  when  some  of  these  constraints  can  be  accommodated,  the 
inherent  complexity  of  the  MCA  process  cannot  be  avoided.  It  is  this  complexity  in 
both  the  representing  and  the  represented  that  discourages  a  formalized  and  con¬ 
sistent  approach  in  assessing  MCA  processing  problems. 

A  proper  modeling  of  the  MCA  process  depends  upon  a  recognition  of  the  many 
aspects  or  "dimensions"  associated  with  this  development  (Figure  1).  Requirements 
for  replicating  the  hierarchical  process  with  the  appropriate  logic  at  an  effective 
level  of  detail  are  fundamental  to  the  analysis.  In  addition,  the  approach  must  be 
reasonably  convenient  and  sufficiently  responsive  to  permit  the  investigator  a 
development  of  timely  solutions. 

The  logic  of  the  functional  relationships  of  the  MCA  process  can  be  "sculptured" 
into  the  network  format  so  as  to  be  visibly  representative  of  the  process  environment 
and  demonstrably  responsive  in  meeting  analytical  requirements.  It  was  determined 
that  the  modeling  of  the  total  MCA  process  will  be  most  conveniently  achieved  through 
a  "procedural  network."  A  "procedural  network"  is  defined  as  a  network  in  which 
the  nodes  indicate  the  decision- points/functions  and  the  arcs  represent  the  flow  of 
responsibilities  and/or  information  associated  with  project  development.  Hence,  this 
network  identifies  events  of  interest  (decision-points/functions)  by  nodes  or  boxes, 
with  the  task  processing  at  any  particular  decision  node  dependent  upon  the  stochas¬ 
tic  history  of  the  project  and  the  network  logic  up  to  that  point.  The  precedence 
relationship  in  the  network  reflects  the  organizational  hierarchy  and  the  typical  or 
required  sequences  for  processing  MCA  projects.  Precedence  requirements  define  the 
network  configuration  and  are  controlled  by  such  factors  as  management  policy,  task 
characteristics,  and  procedural  regulations.  As  indicated  in  Figure  1,  there  must  be 
a  hierarchical  representation,  a  level  of  detail  and  an  extent  of  coverage  developed 
for  the  network  which  respectively  reflect  the  precedence  relationships,  sensitivity 
requirements  and  a  range  of  interest  applicable  to  the  particular  MCA  problem  under 
study. 
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These  precepts  have  permitted  the  development  of  an  MCA  process  network  which 
represents  all  significant  procedural  paths  that  can  occur  in  the  process.  Figure  2 
identifies  the  scope  of  this  network  up  to  construction  contracting.  In  interpreting 
this  diagram,  it  should  be  understood  that  the  flow  of  one  project  or  all  projects 
can  be  considered,  and  project  entities  or  segments  of  projects  may  be  processed 
according  to  local  procedural  needs.  This  is  a  project  controlled  process.  The 
characteristics  of  the  construction  projects  processed  by  the  network  must  control 
the  functional  response,  the  decision  points,  and  the  flow  paths  of  projects  through 
the  network.  These  relationships,  ideally  determined  from  statistical  descriptions 
of  actual  MCA  process  experience  are  at  the  least  related  to  facility  type  (project 
complexity)  and  size  (project  price).  The  complexity  of  these  factors  and  the  number 
of  projects  involved  in  the  process  are  not  easily  handled  by  any  means  other  than  by 
digital  computer  simulation  techniques. 

3.  SIMULATION  OF  THE  MCA  PROCESS.  An  effective  simulation  of  the  MCA  process 
depends  on  the  development  of  a  representative  network  and  an  efficient  and  compati¬ 
ble  simulator  system.  The  simulation  modeling  approach  can  provide  statistical 
descriptions  of  the  MCA  process  derived  from  either  direct  functional  or  surrogate 
representations  which  are  correlatable  to  the  "real-world"  process  (Figure  3).  Thus, 
the  model  is  an  analytical  tool  providing  rapid  answers  in  a  standardized  form  to  the 
analyst,  but  still  subject  to  his  reasoned  interpretation.  Basic  to  these  assess¬ 
ments  are  the  procedural  efficiencies  and  time/cost/resource-impact  considerations 
associated  with  the  functional  performance  of  the  process.  Performance-level  assess¬ 
ments  of  this  process  can  only  be  achieved  by  a  simulation  model  which  is  capable  of 
processing  sophisticated  procedural  networks. 

A  simulation  system  which  could  meet  all  of  the  requirements  associated  with 
analyzing  the  MCA  process  was  not  a  "shelf"  item.  Of  the  several  simulation 
languages  which  are  available,  the  Generalized  Network  Simulator  (GNS),  as  initially 
developed  at  the  University  of  Illinois,  was  selected.  This  computer  simulation  pro¬ 
gram  is  written  in  FORTRAN  IV  and  was  designed  to  simulate  generalized  stochastic 
networks  primarily  as  applied  to  manufacturing  processes.  The  GNS  approach  was 
adapted  and  modified  at  the  Construction  Engineering  Research  Laboratory  (CERL)  to 
support  procedural  networks  and  MCA  process  assessment  requirements.  The  computer¬ 
ized  program  permitting  this  simulation  is  called  the  Corps  of  Engineers  Generalized 
Network  Simulator  (CEGNS),  and  represepts  a  parallel  development  to  the  Generalized 
Manufacturing  Simulator  (GEMS)  System.  ^  Both  CEGNS  for  simulation  procedural  net¬ 
works  and  GEMS  for  simulating  product- assembly  type  networks  are  derived  from  the  GNS 
concept,  as  proposed  by  Hogg,  Dessouky,  and  Tonegawa  in  1977.  J 

Simulation  Model  Features.  The  MCA  process  simulation  model  is  seen  as  being  a  com¬ 
puterized  representation  of  a  procedural  network  which  depicts  the  functional  steps 
involved  in  bringing  an  Army  construction  project  from  the  proposal  state  to  a  fin¬ 
ished  and  accepted  facility.  The  model  must  simulate  the  total  project  flow,  includ¬ 
ing  project  acceptance/rejection;  it  must  approximate  the  impact  of  project  charac¬ 
teristics  and  volume  (workload)  levels;  and  it  must  allow  for  interactions  between 
performance  functions  as  well  as  other  dollar  and  time  significant  effects.  This 
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modeling  system  will  enable  the  analyst  to  create  a  simulation  network  under  nominal 
rules  and  restrictions  and  to  apply  this  algorithm  to  his  problem  areas  without  the 
inconvenience  of  diversionary  tasks.  He  should  not,  for  instance,  have  to  face  a 
monumental  computer  programming  job  when  there  is  a  requirement  to  simulate  an  exten¬ 
sively  revised  network.  A  good  computer  simulation  program  will  effect  an  efficient, 
dynamic  handling  of  network  logic,  permitting  simulation  network  definitions  as  well 
as  problem  requirements  to  be  input  in  relatively  simple  statements. 

The  Simulator.  The  CEGNS  simulation  system  was  developed  to  effectively  accomplish 
general  MCA  simulation  goals  without  restricting  model  growth  and  expanding  require¬ 
ments.  The  CEGNS  and  MCA  model  developments  have  been  iteratively  improved  by 
repeated  testing  and  modifications  (Figure  4).  The  resulting  model  is  a  functioning 
version  of  the  MCA  model  which  has  evolved  with  steadily  improving  sensitivity.  The 
product  of  this  development  was  a  Study  Model  utilized  as  a  development  vehicle  for 
establishing  the  criteria  applicable  to  future  operational  versions  of  the  model.  A 
fundamental  feature  of  the  CEGNS  simulation  modeling  system  is  its  capability  to 
replicate  a  procedural  network.  Figure  5  shows  the  distributions  of  actual  and  simu¬ 
lated  AE  design  times  from  Sacramento  District  data  for  FY80  projects.  The  distribu¬ 
tion  predicted  by  the  MCA  simulation  model  in  this  figure  closely  approximates  the 
distribution  of  the  "real-world"  process. 

4.  STUDY  MODEL.  The  Study  Model  has  been  used  to  test  the  assumptions  and  ver¬ 
ify  the  conclusions  of  the  simulation  study.  Although  the  model  is  incrementally 
improved  as  the  study  advances,  its  basic  features  and  application  goals  developed 
for  evaluating  the  MCA  process  have  not  changed. 

MCA  Assessment  Features.  Special  features  of  the  Study  Model  include: 

(a)  Time/cost/ resource  expenditure  measurements  for  each  process  function. 

(b)  Full  MCA  process  network  representation. 

(c)  Global/local  frame-of-reference  capabil ity . 

(d)  Special  performance-indicator  and  display  outputs. 

Other  special  operating  features  include  a  selectabil ity  in  input  and  output  formats 
and  a  choice  of  data  processing  aids  especially  developed  for  MCA  analysis  require¬ 
ments.  Printouts  of  MCA  simulation  runs  will  include  many  useful  and  informative 
features  for  the  analyst,  such  as  echo-checks  of  the  input,  a  listing  of  projects 
supplied  or  generated,  in-process  traces,  and  output  summaries  in  statistical  format. 
Statistical  summaries  can  be  provided  as  follows: 

(i)  The  number  of  projects  held-up,  lost  (if  any),  and  processed  at  all  queue 
boxes. 

(ii)  A  listing  of  all  projects  passing  through  any  selected  activity  box. 
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(iii)  A  histogram  of  the  time-interval s  of  all  projects  flowing  between  speci¬ 
fied  activity  boxes. 

(iv)  A  bar  chart  of  the  number  of  projects  waiting  to  be  processed  at  any 
specified  queue  box  over  any  specified  time  period. 

(v)  An  accumulating  event-calendar  of  projects  (number  of  projects  processed 
vs.  absolute  time)  for  any  phase(s)  of  the  MCA  process  selected  for 
analysis. 

Example  Applications.  Trial  assessments  of  the  MCA  process  by  the  study  model  are 
now  discussed  to  illustrate  the  power  of  the  simulation  approach.  The  first  problem 
to  be  examined  is  definitive  in  nature  and  initially  requires  the  type  of  simulation 
used  for  evaluating  new  procedures  with  limited  impact.  The  second  problem  proposes 
a  change  with  a  far-reaching  impact  on  all  high  cost  projects;  the  relaxation  of 
three  process  constraints  are  involved;  and  more  sophisticated  modeling  features  are 
requi red. 

(a)  A  Simplified  Assessment  Problem.  It  is  deslreable  to  compare  the  poten¬ 
tially  shorter  construction  procedures  associated  with  prefabrication  and 
industrial i zed-building  approaches  to  the  traditional  procedures  which 
utilize  custom  designs  and  on-site  preparation  of  structural  assemblies. 

The  procedures  will  be  compared  on  the  basis  of  overall  processing  times. 

This  may  be  considered  first  as  a  problem  in  establishing  basic  differences  in  pro¬ 
cessing  efficiencies  (to  be  followed  by  global  reviews  based  on  the  verified  pre¬ 
cepts).  A  needed  simplification  is  the  processing  only  of  projects  relevant  to  the 
study.  This  eliminates  the  imposition  of  unwanted  variables  such  as  the  scheduling 
assumptions,  sequential -processing  effects  and  district  workload  considerations 
implied  by  a  total  project  environment.  A  very  simple  initial  approach  could  con¬ 
sider  two  facility  classes  (low  to  high  complexity)  and  two  size  (price)  categories, 
small  and  large.  This  results  in  an  easily  controlled  study  of  four  projects  pro¬ 
cessed  by  a  minimum  of  two  computer  runs,  one  for  each  of  the  construction 
approaches.  The  combined  results  shown  in  Figure  6  identify  the  time  advantages  of 
the  industrialized  building  approach  as  an  illustration  of  the  analytical  use  of 
CEGNS  outputs. 

(b)  Multiple  Constraint  Problem.  A  hypothetical  proposal  is  assumed  which 
requires  a  procedural  change  in  the  approval  of  high  cost  Architect  and 
Engineering  (AE)  contracts  over  $500,000.  In  the  present  procedure,  all  AE 
contracts  greater  than  $200K  are  reviewed  by  the  Division  Engineer;  all  AE 
contracts  greater  than  $500K  are  reviewed  by  the  Office  of  the  Chief  of 
Engineers  (OCE);  and  all  AE  contracts  of  $1  million  are  also  reviewed  by 
the  Department  of  Defense  (DOD).  In  the  hypothesized  procedure,  only  the 
Division  Engineer  reviews  high  cost  AE  contracts,  with  OCE  notified  and 
coordinated  with  whenever  contracts  of  $1  million  or  more  are  processed.  A 
cost  benefit  determination  for  the  proposed  change  is  requested. 
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A  "mix"  of  500  proposed  construction  projects  is  statistically  generated,  each 
with  ten  descriptors  defining  the  unique  characteristics  of  each  project.  Figure  7 
gives  a  sampling  of  these  synthesized  projects. 

In  simulating  these  projects,  it  can  be  seen  that  in  the  winnowing  assessments 
by  the  Major  Commands  (MACOMs)  and  OCE,  the  number  of  projects  has  decreased  to  380. 
These  "official"  FY  projects  are  distributed  to  five  divisions  representing  the  ten 
districts  which  process  MCA  projects.  A  district  was  "selected,"  and  100  of  the  380 
projects  were  processed  by  this  district.  Three  comparative  runs  were  made  for  each 
assessment: 

(i)  Normal  configuration  run. 

(ii)  Hypothetical -change  configuration  run. 

(iii)  "No  restriction"  (fast-track)  configuration  run. 

In  these  three  simulation  runs,  all  configuration  and  branching  factors  as  well  as 
controllable  stochastic  events  are  kept  the  same  —  except  for  the  simulated  condi¬ 
tional  approvals  for  the  projects  as  required  by  the  problem. 

As  part  of  the  output  for  each  of  the  runs,  planning  durations,  programming  durations 
and  design  times  were  statistically  plotted  for  all  processed  projects.  (See  Figure 
8  for  a  typical  output  from  a  normal  run  as  an  example  of  these  products.)  In  addi¬ 
tion,  a  programming  output  calendar  for  the  27  projects  with  an  AE  fee  above  $200,000 
which  completed  final  design  was  generated  for  each  run  (Figure  9).  This  project 
output  display  illustrates  the  time  between  the  normal  (basic),  hypothetical  (no 
upper  AE  approval),  and  limiting  (no  restrictions)  runs.  As  shown  in  Figure  10, 
there  are  more  significant  savings  in  projects  with  longer  design  processing  times. 
Table  1  summarizes  the  27  projects  and  their  impact  on  the  MCA  process.  For  the 
assumptions  made,  it  was  determined  that  the  cost  of  the  net  delay  per  high  cost  pro¬ 
ject  was  approximately  $60,600;  where  "delay"  implies  the  time  difference  between  the 
normal  and  hypothetical  runs.  Under  the  assumptions  made  for  this  hypothetical  pro¬ 
posal,  $60,600  amounts  to  approximately  a  1%  saving,  which  is  probably  not  suffi¬ 
ciently  significant  to  justify  the  change  —  even  if  feasible  under  other  considera¬ 
tions.  Such  results  can  assi st  a  decision  maker  in  evaluating  proposed  changes  to 
the  MCA  process. 

Evaluation  of  Study  Model  Capabilities.  It  is  evident  from  Study  Model  applications 
and  experience  that  the  simulation  approach  provides  a  powerful  tool  for  analyzing 
the  MCA  process.  If  sources  of  real-world  response  data  can  be  developed,  the 
present  model  can  be  advanced  further  through  verification  and  updating  procedures. 
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Table  I 

Significance  of  Hypothetical  Change 


Project 

No. 

A 

Hrs. 

Contract 

Price 

$M 

AE 

Fee 

$K 

Adjusted 

Delay: 

Days 

27  Project 
Fac.-Use 
Val  ue 
$/Day 

All  27 
Projects 
Delay  $ 

Hi  Cost(AE) 
Projects 
Delay  $ 

209 

20 

5 

229 

3 

715 

2145 

86 

10 

7 

275 

2 

1000 

2000 

146 

0 

6 

238 

0 

860 

0 

71 

-10 

7 

264 

-2 

1000 

-2000 

255 

-20 

7 

290 

-3 

1000 

-3000 

103 

-10 

8 

301 

-2 

1143 

-2286 

81 

-30 

11 

433 

-4 

1590 

-6360 

74 

-40 

7 

291 

-5 

1000 

-5000 

309 

70 

10 

206 

9 

1430 

12870 

100 

60 

11 

456 

8 

1590 

12720 

414 

70 

8 

327 

9 

1143 

10287 

122 

80 

12 

251 

10 

1714 

17140 

138 

130 

12 

248 

17 

1714 

29138 

133 

150 

13 

274 

19 

1860 

35340 

113 

80 

12 

488 

10 

1714 

17140 

83 

80 

11 

457 

10 

1590 

15900 

121 

40 

16 

334 

5 

2290 

11450 

75 

60 

14 

286 

8 

2000 

16000 

175* 

110 

14 

561 

14* 

2000 

28000 

28000 

284 

120 

19 

396 

15 

2714 

40710 

87 

-30 

18 

364 

-4 

2571 

-10284 

213* 

242 

22 

871 

31* 

3143 

97433 

97433 

105* 

120 

14 

550 

15* 

2000 

30000 

30000 

35 

140 

27 

225 

18 

3857 

69426 

327 

Zll 

13 

266 

27 

1857 

50139 

23* 

280 

22 

890 

35* 

3143 

110005 

110005 

10 

0 

25 

258 

0 

3571 

0 

PW: 

578913 

265438* 

Future  Worth: 

6,669,077 

3,057,846 

FW-PW: 

6,090,164 

2,792,408 

Significance 

Net  Delay  $: 

$528,660 

$242,400 

Net  Delay 

$/Project : 

$19,600 

$60,600 

Net 

Delay  $ 

=  PW  [1  - 

(1*1 ) 

”nj;  1  =  8% 

inflation  + 

5%  interest. 
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5. _ CONCLUSIONS.  The  types  of  engineering  network  models  which  can  effectively 

represent  the  MCA  process  and  the  scope  of  the  computer  simulation  developments  to 
implement  these  representations  have  been  brought  into  focus  by  the  current  study. 

It  has  been  concluded  that  the  CEGNS  computer  simulation  system  will  support  a  simu¬ 
lation  model  of  the  total  MCA  process  at  all  required  levels  of  detail.  The  simula¬ 
tion  approach  selected  will  permit  rapid  assessments  of  proposed  changes  in  "real 
world"  procedures,  and  can  be  used  for  diagnosing  most  problems  that  arise  in  opera¬ 
tions  at  the  management  and  field  performance  levels. 

Existing  Capabilities.  The  Study  Model  has  replicated  MCA  procedures  from  the  time 
of  project  formulation  to  construction  completion.  The  Study  Model  demonstrated  the 
capacity  of  the  CEGNS  simulation  modeling  system  to  support  the  required  levels  of 
analysis.  CEGNS  requires  relatively  little  change  in  the  input  stream  for  solving 
two  problems  with  minor  differences.  This  capability  allows  the  analyst  to  vary  both 
the  input  requirements  of  the  problem  and  the  problem  itself  in  order  to  determine 
any  instability  or  lack  of  response  in  the  solution.  The  adequacy  of  the  structural 
and  processing  concepts  of  the  Study  Model  has  been  demonstrated.  Verification  of 
this  model  from  detailed  operations  information,  plus,  computer-processing  efficiency 
adjustments,  can  result  in  a  "Prototype"  MCA  Process  Simulation  Model  requiring  only 
a  final  calibration  and  validation  phase  before  delivery  as  an  "operational" 
model .  ^ 

Future  Developments.  The  MCA  Simulation  Model  will  have  a  broad  applications  poten¬ 
tial  .  The  current  Study  Model  has  been  demonstrated  in  the  research  mode  and  can  now 
be  used  in  comparative  investigations  such  as  measuring  the  relative  impact  of  pro¬ 
posed  changes  in  the  MCA  process.  Experience  gained  from  exercising  the  Study  Model 
has  provided  the  base  (criteria)  for  developing  an  MCA  Process  Simulation  Model  of 
greatest  benefit  to  the  Corps.  This  model  can  contribute  to  improved  procedures- 
analysis;  improved  management  control  (impact  forecasting,  optimal  product  schedul¬ 
ing,  resource  allocation  studies,  etc);  and  the  assessment  of  special  problem  areas. 
Confidence  in  the  output  of  an  "operational"  MCA  Process  Simulation  Model  will  be 
strengthened  if  it  is  supported  by  a  significant  data  bank  which  has  been  developed 
in  response  to  determined  modeling  needs13-1.  This  data  bank  should  contain  key 
information  that  correlates  external  influences  and  events  with  the  decision  func¬ 
tions  which  impact  the  MCA  process.  The  data  bank  can  contribute  to  the  creation  of 
a  more  realistic  generation  of  projects  and  attribute  assignments  for  these  projects. 
It  will  be  a  source  for  cal ibration  values  to  be  applied  to  network  logic  and  subse¬ 
quent  adjustments  to  this  logic.  Finally,  it  will  be  a  principal  reference  in  verif¬ 
ication  arguments. 

Future  development  objectives  require  the  refinement  of  model  capabilities  to 
that  of  a  convenient  and  useful  analytical  tool  (see  Table  II).  The  model  must 
represent  program  fluctuations  or  imposed  procedural  changes  with  a  precision  suffi¬ 
cient  to  match  computer  responses  to  the  corresponding  "real-world"  response.  These 
refinements  will  permit  the  convincing  demonstrations  required  of  an  operational  sys¬ 
tem.  Only  through  such  demonstrations  can  the  level  of  confidence  be  promulgated 
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which  is  necessary  for  the  model  output  to  be  considered  a  primary  argument  in  CE 
evaluations  of  the  MCA  process. 


Table  II 

MCA  Process  Simulation  Model  Development  Objectives 

1.  Replicate  network  flow. 

2.  Perform  time/cost/resource  expenditure  measurements. 

3.  Approximate  the  performance  response  of  process  functions  in  the  network 
to  real-world  response. 

4.  Incorporate  macro/micro  network  focusing  (modules  with  graded  detail). 

5.  Consider  study-support  flexibility;  provide  selectable  outputs. 

6.  Allow  for  exogenous  influences. 
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UPPER  ESCHELON  INTERFACES 
OCE  /  CRRC  REVIEWS  8  CONTROL 


MACOM  /  INSTALLATIONS 

_ DIVISION _ 

DISTRICTS 


TOTAL  MCA  PROCESS 


Figure  2.  Network  diagram  of  the  MCA  process  to  construction  contract!* 
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OUTPUT 


Figure  3.  A  simulation  concept  for  the  MCA  process. 


MODELING  SIMULATION 


VERIFICATION 

CALIBRATION 

VALIDATION 


Figure  4.  The  iterative  development  of  an  MCA  simulation  modeling  system. 
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AE  DESIGN  COMPLETION  TIMES  FOR 
SPK  FY80  PROJECTS 

o  SIMULATED  DESIGN  TIME 
•  ACTUAL  DESIGN  TIME 


0 


Figure  5.  AE  design-time  distributions  for  FY80  projects  from  a 
Study  Model  simulation  run  and  from  actual  performance 
records. 
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Figure  6.  Simulated  MCA  processing  times  prior  to  construction  for  industrial ized-buil ding 
and  site-built  facility  options  (4  projects). 


Figure  7.  Sample  from  printout  of  a  list  of  synthesized  projects 
and  their  attributes. 


Figure  8.  Simulated  planning-phase  times  for  500  MCA  projects 
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DESIGN -COMPLETION  DATE 

4th  Quarter  1st  Quarter  2nd  Quarter 
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ABSTRACT.  This  paper  discusses  the  solution  of  the  boundary  layer 
equations  at  the  intersection  of  a  hemispherical  blast  wave  and  a  ground  plane 
over  which  it  is  moving.  A  coordinate  stretching  transformation  is  used  to 
eliminate  the  singularity  at  the  shock/surface  interface  and  reduce  the 
governing  equations  at  the  interface  to  a  set  of  ordinary  differential 
equations.  The  solution  of  the  equations  at  the  interface  is  presented. 

The  original  governing  equations  consist  of  a  set  of  three,  coupled, 
nonlinear  partial  differential  equations.  The  coordinate  transformation 
produces  a  set  of  three,  coupled  nonlinear  ordinary  differential  equations  at 
the  shock/ surface  interface.  Boundary  conditions  are  given  at  the  surface  and 
at  an  infinite  distance  from  the  surface,  forming  an  asymptotic  two  point 
boundary  value  problem.  A  method  developed  by  Nachtsheim  and  Swigert  is 
employed  to  reduce  the  problem  to  an  initial  value  problem.  An  initial 
estimate  for  two  unknown  gradients  at  the  wall  is  made  and  an  iterative 
method  is  used  to  systematically  reduce  the  mean  square  error  in  the  solution 
at  the  outer  edge  of  the  boundary  layer  by  changing  the  estimates  of  the 
gradients  at  the  surface.  This  method  requires  the  formulation  and  solution 
of  six  additional  ordinary  differential  equations  which  are  coupled  to  the 
first  three.  A  Kutta-Merson  method  is  used  to  solve  the  nine  coupled 
equations.  The  result  is  a  solution  to  the  original  equations,  plus  a 
correction  to  the  two  estimates  of  the  gradients  at  the  surface.  The 
iteration  procedure  is  repeated  until  the  solution  at  the  outer  edge  of  the 
boundary  layer  agrees  with  the  given  conditions  to  within  some  specified 
degree  of  accuracy. 
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NOMENCLATURE, 


r  radial  distance  from  center  of  the  hemispherical  shock 

w  angle  measured  from  the  surface  upwards 

R  radius  of  the  shock  front 

s 

velocity  of  the  shock  front 
^  particle  velocity  at  the  shock  front 
temperature  at  the  shock  front 
P  pressure  at  the  shock  front 
pg  density  at  the  shock  front 

T&  ambient  temperature 

p  ambient  density 

v  kinematic  viscosity  of  the  ambient  air 

a 

u  viscosity  of  the  ambient  air 

P  Prandtl  number  of  the  ambient  air 
r 

C  Specific  heat  of  the  ambient  air 
P 

gas  constant  for  air 

K  thermal  conductivity  of  air 

Tw  temperature  at  the  surface  of  the  ground 


n  transformed  angular  distance  given  by  n  =  rw 


U 


Ur  transformed  radial  velocity  given  by  0^  =  U^/U^ 


V  /CR  -r)  U 


V  transformed  angular  velocity  given  by  V  =  —  \ — - — — — H 


V 


a 
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p  transformed  density  given  by  p  =  p/ps 

T  transformed  temperature  given  by  T  -  T/Ts 

£  normalized  radial  distance  given  by  £  =  r/Rs 

t  normalized  time  given  by  x  =  tU^/R^ 

T  transformed  surface  temperature  T  =  T  /T 

to  (0  U)  S 

¥  =  UT  transformed  radial  velocity 
4>  —  Vw  transformed  angular  velocity 
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P  v  T 
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1.  INTRODUCTION.  When  a  nuclear  weapon  or  high  explosive  charge  is 
detonated  in  the  atmosphere  a  blast  wave  is  formed.  At  the  front  of  the 
blast:  wave  is  a  shock  surface  which  separates  two  regions  of  gas  in  different 
states.  In  front  of  the  expanding  shock  surface,  the  gas  is  at  rest  and  at 
ambient  pressure  and  temperature.  As  gas  is  engulfed  by  the  moving  shock,  it 
undergoes  a  discontinuous  jump  to  higher  pressure,  temperature  and  velocity. 

In  the  blast  wave  behind  the  shock,  the  pressure  and  velocity  decay  with 
distance  from  the  shock.  In  strong  blast  waves  near  the  detonation  point, 
the  temperature  increases  with  increasing  distance  from  the  shock. 

In  order  to  study  the  strong  blast  waves  which  are  formed  just  beyond  the 
fire  ball  region  of  a  large  explosion,  a  point  detonation  model  is  often 
used.  In  the  point  detonation  model  the  physics  of  the  detonation  is  ignored 
and  a  spherical  blast  wave  is  considered  to  be  formed  by  the  instantaneous 
deposition  of  a  large  amount  of  energy  at  a  point  in  space.  The  first  analyt¬ 
ical  solution  for  the  motion  of  a  strong  blast  wave  formed  by  a  point  detona¬ 
tion  was  the  Taylor-Sedov1  similarity  solution.  The  Taylor-Sedov  solution 
gives  the  motion  of  a  spherical  shock  front  and  the  pressure,  temperature  and 
velocity  of  the  gas  behind  the  front.  The  solution  can  be  modified  to  solve 
the  problem  of  a  point  detonation  on  flat  surfaces.  In  this  case  the  effec¬ 
tive  energy  is  doubled  and  the  shock  surface  becomes  hemispherical,  (see 
Figure  1) 

An  implicit  assumption  in  the  Taylor-Sedov  solution  for  a  hemispherical 
blast  wave  is  that  the  flow  is  inviscid.  All  real  gases  are  viscous,  however. 
To  accurately  model  the  blast  wave  flow  near  the  surface,  viscous  effects 
must  be  taken  into  account . 

When  a  viscous  gas  flows  past  a  stationary  surface,  the  gas  directly 
adjacent  to  the  surface  sticks  and  remains  at  rest.  The  stationary  gas  at 
the  surface  acts  to  decelerate  the  gas  adjacent  to  it,  which  in  turn  deceler¬ 
ates  more  gas.  In  this  manner,  a  small  but  finite  transition  region  developes 
in  which  the  gas  goes  from  zero  velocity  at  the  surface  to  the  undisturbed 
velocity  far  from  the  surface.  An  analogous  temperature  transition  region 
also  developes  in  the  flow.  For  an  isothermal  surface,  the  gas  directly 
adjacent  to  the  surface  remains  at  surface  temperature.  The  temperature  of 
the  gas  beyond  the  surface  increases  through  a  small  transition  region  until 
it  reaches  the  undisturbed  flow  temperature  far  from  the  surface.  The  veloc¬ 
ity  and  temperature  transition  regions  are  known  as  the  velocity  and  thermal 
boundary  layers.  Both  types  of  boundary  layers  are  present  in  blast  wave 
flow. 


The  boundary  layer  flow  behind  the  shock  surface  starts  out  as  a  laminar 
flow,  i.e.  a  flow  in  which  disturbances  tended  to  die  out.  As  the  flow  moves 
away  from  the  shock,  it  very  rapidly  becomes  a  turbulent  flow,  i.e.  a  flow  in 
which  disturbances  tend  to  grow.  Becauses  this  paper  is  concerned  with  flows 
only  very  near  the  shock  surface  the  equation  governing  laminar  boundary 
layer  flow  will  be  used. 

Two  solutions  have  been  developed  for  the  boundary  layer  flow  within  a 
hemispherical  blast  wave.  Both  the  solution  due  to  Crawford,  et.al^  and 
the  solution  due  to  S.  IV.  Liu  and  H.  Mirels3  depended  on  the  similarity 
property  of  the  flow.  The  similarity  assumption  limits  the  validity  of  the 
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solution  to  the  high  pressure  region  near  the  detonation.  In  order  to 
extend  the  solution  into  the  low  pressure  region  far  from  the  detonation,  a 
study  is  now  underway  to  formulate  a  solution  that  does  not  depend  on  a  simi¬ 
larity  assumption.  Because  the  solutions  of  Crawford  and  Liu  do  not  agree  on 
the  temperature  profile  behind  the  shock,  it  is  also  hoped  that  the  planned 
solution  will  help  clarify  this  matter. 

The  planned  solution  is  based  on  a  finite  difference  scheme.  The  scheme 
requires  three  initial  conditions  and  seven  boundary  conditions.  The  initial 
conditions  will  be  taken  from  Crawford  et.al.  Three  of  the  boundary  conditions 
will  come  from  the  known  conditions  at  the  surface.  Two  of  the  boundary 
conditions  will  come  from  an  inviscid  outer  flow  solution.  The  final  two 
boundary  conditions  must  come  from  a  solution  at  either  the  shock/surface 
interface  or  detonation  point. 

Singularities  exist  at  both  the  detonation  point  and  the  shock/ surface 
interface.  For  a  point  detonation,  the  temperature  goes  to  infinity  at  the 
center  of  explosion,  which  results  in  the  singularity.  The  singularity  at 
the  shock/ surface  interface  results  from  the  requirements  of  shock  theory  and 
boundary  layer  theory.  Shock  theory  requires  that  gas  engulfed  by  the  shock 
jump  discontinuously  from  a  state  of  rest  and  ambient  temperature  to  a  state 
of  higher  velocity  and  temperature.  Boundary  layer  theory  requires  that  the 
gas  adjacent  to  the  surface  remain  at  rest  and  at  surface  temperature,  (see 
Figure  2)  An  attempt  was  made  to  find  a  coordinate  transformation  which 
would  eliminate  both  singularities  allowing  a  solution  at  both  ends  of  the 
blast  wave;  none  was  found.  A  transformation  was  found,  however,  which 
allowed  an  asymptotic  solution  at  the  shock/ surface  interface  alone. 

The  transformation  used  to  eliminate  the  singularity  at  the  shock/surface 
interface  is  based  on  the  simularity  transform  from  the  Blasius  flat  plate 
solution.  The  boundary  layer  problem  behind  a  hemispherical  shock  does  not 
meet  the  necessary  conditions  to  have  a  similarity  solution.  The  solution 
can,  however,  be  considered  locally  similarly  near  the  interface.  Near  the 
interface  the  transformation  reduces  the  governing  equation  from  a  set  of 
partial  differential  equations  with  three  independent  variables  to  a  set  of 
ordinary  differential  equations  with  the  similarity  variable  n  as  the  indepen¬ 
dent  variable.  The  asymptotic  solution  at  the  interface  based  on  the  simi¬ 
larity  transform  is  the  topic  of  the  rest  of  this  paper. 

2.  THE  BASIC  GOVERNING  EQUATIONS.  There  are  three  basic  equations 
governing  unsteady  compressible  boundary  layer  flow.  The  first  equation  is 
the  Continuity  Equation,  based  on  the  conservation  of  mass.  The  second 
equation  is  the  Momentum  Equation,  based  on  the  conservation  of  momentum.  The 
third  equation  is  the  Energy  Equation,  based  on  the  conservation  of  energy. 
Because  of  the  geometry  of  the  problem,  the  two  dimensional  cartesian  form 
of  the  equations  have  been  transformed  to  their  axisymmetric  spherical  form 
(see  Figure  3) .  To  simplify  the  equation  w,  shown  in  Figure  3,  has  been 
substituted  for  0.  Further,  it  has  been  assumed  that  w  is  small  so  that 
sin  w  =  ui  and  cos  oj  =  1,  The  resulting  equations  are  shown  below. 
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HEMISPHERICAL  SHOCK  FRONT 
DUE  TO  A  POINT  DETONATION 


SHOCK  /  BOUNDARY  LAYER  INTERFACE 


Figure  2.  Singularity  at  the  Shock/ Boundary  Layer  Interface. 
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Continuity  Equation 


Figure  3.  Coordinate  System, 
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In  these  equation,  p  is  the  density,  t  is  time,  is  the  radial 

velocity,  r  is  the  radial  distance,  m  is  the  angle  measured  from  the  surface 
upwards,  is  the  angular  velocity,  Pg  is  the  pressure  in  the  outer  flow,  y 

is  the  viscosity,  C  is  the  specific  heat,  CD  is  the  ideal  gas  constants  and 

p  k 

K  is  thermal  conductivity. 

In  addition  to  the  three  basic  equations,  there  are  three  assumptions 
that  have  been  made  in  formulating  the  problem.  First,  it  has  been  assumed 
that  the  gas  obeys  the  ideal  gas  equation  of  state  p  =  P/C^T.  Second,  that 

the  Prandtl  number  is  constant  ?r  =  (yC  /K)  *  constant.  Finally,  that  the 

viscosity  is  a  direct  function  of  temperature  y  =  y  (T/T  ) . 

Br  Cl 
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3.  TRANSFORMING  AND  NORMALIZING  THE  BASIC  GOVERNING  EQUATIONS.  In  order 
to  eliminate  the  singularity  at  the  shock/surface  interface,  the  basic  equations 
are  transformed  in  terms  of  the  following  variables. 

tU 

t  = 

R 

s 


s 


p  -  p/ps 


T  =  T/Ts 

In  the  transformation  Rg  is  the  shock  radius,  is  the  Kinematic  viscosity 
ahead  of  the  shock,  U  ,  pg  and  Tg  are  the  flow  velocity,  density  and  tempera¬ 
ture  at  the  shock  front.  The  transformation  which  give  t,  U  ,  p  and  T 

serve  only  to  nondimcnsional i ze  and  normalize  the  variables.  Transformation 
for  T|  allows  us  to  eliminate  the  singularity  located  at  the  shock  radius,  Rg- 

The  three  basic  equations  have  been  rewritten  in  terms  of  the  new 
coordinate  system.  In  addition,  the  ideal  gas  equation  of  state  has  been 
used  to  eliminate  p  in  terms  of  T  and  P  .  The  viscosity  y  has  been  eliminated 

in  terms  of  T  and  the  constants  T  and  u  .  The  thermal  conductivity  K  has 

3-  cl 

been  eliminated  in  terms  of  T  and  the  constants  P  ,  C  ,  T  and  p  *  The 

t  p  a  a 

resulting  set  of  three  coupled,  partial-differential  equations  is  given  below. 
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4.  TRANSFORMED  GOVERNING  EQUATION  AT  THE  SHQCK/SURFACF,  INTERFACE.  As 
the  shock/ surf ace  interface  is  approached  r  ->  Rg  and  therefore  £  1 .  Thus 

as  the  interface  is  approached,  terms  with  in  the  numerator  drop  out. 

We  are  left  with  derivatives  with  respect  to  n  only.  The  continuity  equation 
simplifies  to  n 
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the  Momentum  Equation  becomes 
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finally,  the  Energy  Equation  becomes 
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The  governing  equations  require  five  boundary  conditions  for  solution, 
three  at  the  surface  and  two  in  the  outer  flow*  At  the  surface  the  velocity 
gas  must  go  to  zero  i*e.  =0.  In  addition  for  an  isothermal  surface 

the  gas  temperature  T  must  equal  the  surface  or  wall  temperature  i.e. 

T  =  T  „  Boundary  layer  theory  requires  that  and  T  asymptotically  approach 

the  values  of  radial  velocity  and  temperature  found  in  outer  flow,  which  leads 
to  the  following  conditions  1  and  T  1 .  The  five  conditions  are 

summarized  below. 

at  n  =  0 

0  =  V  =  0  and  T  “  T 

r  03  w 


as  n  00 

U  1  and  T  “>  1 
r 

With  only  derivatives  of  q  left  in  the  equation,  we  can  now  regard  them 
as  a  coupled  set  of  ordinary  differential  equations  (0*  D.  E.fs)  rather  than 
a  set  of  coupled  P.  E.  D.’s.  Changing  to  a  simplier  notation  we  obtain  the 
following . 
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Continuity  Equation 
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In  this  notation,  the  primes  denote  differentiation  with  respect  to  n-  The 
coefficients  A,  B,  C  and  D  are  independent  of  n. 


5.  ASYMPTOTIC  SOLUTION  AT  THE  SHOCK/SURFACE  INTERFACE.  The  outer  flow 
boundary  conditions  make  the  problem  an  asymptotic  boundary-value  problem. 

In  order  ^to  make  use  of  conventional  algorithms  for  the  integration  of 

0.  D.  E.'s,  the  problem  must  be  transformed  into  a  initial-condition  problem. 

A  method  developed  by  Nachtsheim  and  Swigert  of  the  Lewis  Research  Center6  is 
used.  Two  additional  boundary  conditions  at  the  surface  are  substituted  for 
the  two  conditions  in  the  outer  flow.  The  new  conditions  at  the  surface  re¬ 
quire  that  Y '  =  X  and  0'  =  Y  when  n  =  0.  The  correct  values  of  X  and  Y  are 
unknown  initially  and  must  be  determined  as  part  of  the  solution. 

The  method  of  solution  is  iterative.  In  the  initial  iteration,  the 
values  of  X  and  Y  are  guessed.  X  and  Y  are  used  to  start  an  integration  from 
the  surface  to  some  large  value  of  n :  n  edge.  The  results  of  the  integration 
at  n  edge  are  used  to  estimate  the  error  in  the  solution  in  the  next  iteration 
after  a  change  in  X  and  Y  of  AX  and  AY.  AX  and  AY  are  adjusted  to  minimize 
the  error.  X  and  Y  are  changed  by  the  resulting  AX  and  AY.  The  new  X  and  Y 

values  serve  as  the  basis  for  the  next  integration  to  n  edge.  The  process  is 

repeated  until  the  estimated  error  falls  below  a  pre-set  limit,  at  which 
point  the  problem  is  terminated. 

The  estimated  error,  which  serves  as  the  basis  for  convergence,  is  found 

by  expanding  the  solution  about  X  and  Y.  The  values  of  Y  and  0  at  n  edge  are 

considered  to  be  functions  of  X  and  Y.  The  values  of  Y  and  0  in  the  next 
iteration  are  estimated  using  the  first  three  terms  of  a  Taylor  series 
expansion.  The  difference  between  the  estimated  values  of  Y  and  0  at  n  edge 
and  their  known  value  in  the  outer  flow  make  up  part  of  the  estimated  error. 
Experience  has  shown  that  two  additional  conditions  in  the  outer  flow  must  be 
imposed  on  the  solution  to  insure  that  it  is  unique.  The  additional 
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conditions  require  that  the  radial  velocity  and  temperature  in  the  boundary 
layer  approach  the  outer  flow  asymptotically,  i.e.  ¥ '  -*  0  and  0'  -*  0  as  n  -*■  “. 

The  values  of  ¥'  and  0 '  at  n  edge  are  estimated  in  the  same  manner  as  ¥  and  0. 

The  difference  between  the  estimated  values  of  ¥’  and  0'  at  n  edge  and  their 
known  values  in  the  outer  flow  makes  up  the  final  part  of  the  estimated  error. 

The  four  errors  which  make  up  the  total  estimated  error  in  the  solution 
are  designed  6^,  S2>  ^3  an^  <$4-  <5^  and  $2  refer  to  the  difference  between  the 

known  and  estimated  values  of  ¥  and  0  respectively.  S_  and  5^  refer  to  the 

difference  between  the  known  and  estimated  values  of  ¥ '  and  0'.  The  equations 
for  6j,  $2>  6^  and  6^  are  shown  below. 

s  ¥  (n  edge,  X  +  AX,  Y  +  AY)  -  1  = 

¥  Cn  edge,  X,  Y)  +  ¥  (n  edge,  X,  Y)  AX  +  ¥  (n  edge,  X,  Y)  AY  -  1  = 

x  y 

¥+¥AX+¥AY-l 
x  y 

62  =  0  Cn  edge,  X  +  AX,  AY)  -  1  = 

0  Cn  edge,  X,  Y)  +  0v  Cn  edge,  X,  Y)  AX  +  e  Cn  edge,  X,  Y)  AY  -  1  = 

x  y 

0  +  0  AX  +  0  AY  -  1 
x  y 

£  4”  Cn  edge,  X  +  AX,  Y  +  AY)  -  0  = 

¥’  Cn  edge,  X,  Y)  +  ¥'  Cn  edge,  X,  Y)  AX  +  ¥•  Cn  edge,  X,  Y)  AY  = 

x  y 

¥'  +  ¥'AX  +  ¥'  AY 
x  y 

64  0*  (n  edge,  X  +  AX,  Y  +  AX)  -  0  « 

0*  (n  edge,  X,  Y)  +  0'  (n  edge,  X,  Y)  AX  +  0'  Cn  edge,  X,  Y)  AY  = 

x  y 

0’  +  0^AX  +  0bAY 

In  these  equations,  the  subscripts  x  and  y  denote  partial  differentiation 
with  respect  to  X  and  Y. 

The  actual  convergence  criterion  is  based  on  the  sum  of  the  squares  of 
the  errors  <$2>  63  and  64 •  This  sum  of  the  square  errors  is  minimized 

with  respect  to  AX  and  AY  resulting  in  the  following  equations.  Minimizing 
with  respect  to  AX. 
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Minimizing  with  respect  to  AY, 
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The  results  are  two  equations  that  can  be  solved  simultaneously  for  AX  and  AY. 


The  Minimization  equation  contains  Y,  0,  and  0*  which  can  be  evaluated 

at  n  edge  by  integrating  Equation  1,  Equation  2  and  Equation  3.  It  also 

contains  0*  and  0*  which  must  be  evaluated  by 

X7  y 9  2t 3  y3  x*  y,  x  y 

integrating  another  set  of  equations.  The  necessary  equations  are  formulated 
by  differentiating  Equations  1  through  3  with  respect  to  X  and  Y.  Differen¬ 
tiating  with  respect  to  X  results  in  the  following 
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The  boundary  conditions  for  Equations  4,  5  and  6  are  the  following: 

Y  =  g  =  a  =  0'=o  and  Y '  =  1  at  n  =  0 
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Differentiating  with  respect  to  Y  results  in  the  following: 
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The  boundary  condition  for  Equations  7 ,  8  and  9  are  the  following: 


(7) 


(8) 


(9) 


¥  =0  =  A  =  T '  =  0  and  0 ’  =  1  at  n  “  0 

y  y  y  y  y 


Equations  1  through  9  make  up  a  set  of  coupled  0.  D.  E. 's  which  can  be 
integrated  from  n  =  0  to  n  =  n  edge. 

The  actual  integration  is  carried  out  using  a  Kutta-Merson  method.  The 
algorithm  automatically  adjusts  step  size  in  n  to  maintain  the  absolute 
truncation  error  below  a  specified  amount.  The  algorithm  also  adjusts  the 
step  so  that  the  results  of  the  integration  can  be  printed  out  at  a  number  of 
specified  locations.  The  basic  computational  method  is  shown  in  Figure  4. 

6 .  RESULTS .  Nachtsheim  and  Swigert6  have  shown  that  the  range  of 
initial  X  and  Y  for  which  the  problem  will  converge  decreases  with  increasing 
values  of  n  edge.  They  have  also  shown  that  the  accuracy  of  the  solution 
increases  with  increasing  values  of  n  edge.  Therefore  it  is  advisable  to 
•start  the  calculation  with  small  values  of  n  edge  and  a  relatively  large 
acceptable  error,  then  uses  the  results  to  move  to  larger  values  of  n  edge 
and  smaller  acceptable  errors, 

A  set  of  four  runs  were  made  using  different  values  of  n  edge.  In  each 
run,  the  step  size  in  n  was  An  =  0.3,  the  truncation  error  limit  was  1.0  x  1 0“ I+ 
and  a  limit  on  the  sum  of  the  squares  of  the  estimated  errors  6^,  &2>  ^3  an^ 

6.  was  1.0  x  10-6. 

4 

The  first  run  was  a  five  step  integration  with  n  edge  =  1.5.  The  problem 
converged  rapidly,  but  failed  to  match  the  outer  flow  to  within  the  error 
limit.  The  run  was  terminated  after  fifteen  iterations.  The  results  of  the 
first  six  iterations  of  the  five  step  integration  are  shown  in  Figures  5  and  6. 
The  second  run  was  a  ten  step  integration  with  0  edge  =  3.0.  The  problem 
converged  to  an  acceptable  solution  in  six  iterations.  The  results  of  the  sixth 
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Figure  4 .  Flowchart  of  Method  of  Numerical  Solution 


132 


iteration  are  shown  in  Figures  7  and  8.  The  third  run  was  a  fifteen  step 
integration  with  n  edge  =  4.5.  The  fifteen  step  integration  also  converged 
to  an  acceptable  solution  in  six  iterations.  The  solution  from  the  fifteen 
step  integration  was  more  accurate  than  the  ten  step.  The  results  of  the  six 
iterations  in  the  fifteen  step  run  are  shown  in  Figures  9  and  10.  A  one 
hundred  step  run  was  made  for  comparison  with  the  other  runs.  The  one  hundred 
step  run  was  also  found  to  converge  in  six  iterations.  The  results  of  the 
sixth  iteration  of  all  four  runs  are  shown  in  Figures  11  and  12. 

7.  CONCLUSION.  The  results  of  the  four  computer  runs  indicate  that  it 
is  possible  to  find  an  asymptotic  solution  to  the  unsteady,  compressible 
boundary  layer  equations  at  the  shock/ surface  interface  using  the  coordinate 
transforms  developed  in  this  study.  Further  they  indicate  that  a  reasonably 
accurate  solution  can  be  achieved  using  as  few  as  ten  integration  steps. 

The  radial  velocity  (T)  profile  shown  in  Figure  11  has  the  same  form  as 
the  radial  velocity  (f)  given  in  References  2  and  3.  The  temperature  (0) 
profile  shown  in  Figure  12  has  the  same  form  as  the  temperature  (g)  profiles 
in  the  same  references.  A  direct  one  to  one  comparison  with  the  profiles  in 
the  references  is  not  appropriate  because  of  differences  in  coordinate 
transforms,  however,  the  similarity  does  indicate  all  three  methods  are 
yielding  the  same  type  of  solution  in  physical  space.  All  three  indicate  at 
near  the  interface,  both  the  velocity  and  temperature  approach  their  outer 
flow  values  asymptotically  with  no  overshoot  within  the  boundary  layer. 

The  asymptotic  solution  developed  in  this  study  is  now  available  for  use 
as  a  boundary  condition  in  a  finite  difference  solution  for  the  entire 
boundary  layer  flow  within  a  hemispherical  blast  wave.  The  finite  difference 
scheme  will  be  based  on  the  three  transformed,  basic  governing  equations 
already  presented.  Because  the  scheme  will  not  contain  a  similarity  assumption 
it  should  be  possible  to  extend  the  solution  in  the  lower  pressure  region. 

The  complete  solution  in  the  region  should  provide  more  accurate  estimates  of 
near  surface  gas  velocities,  dust  pick-up  and  dust  transport,  which  will  in 
turn  allow  more  accurate  estimates  of  the  loading  on  ground  targets  during 
nuclear  attacks. 
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Figure  12.  Sixth  Iteration  in  the  Solution  for  the  Transformed 
Temperature. 
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ABSTRACT .  BLOP  is  a  computer  code  for  predicting  the  air-blast  loads  on 
targets  which  can  be  described  approximately  by  a  series  of  rectangular  paral¬ 
lelepipeds.  The  code  has  been  developed  at  BRL  to  quickly  obtain  a  prediction 
of  the  average  loads  on  the  surfaces  of  a  target  encountering  a  blast  wave, 
without  having  to  resort  to  a  hydrocode  computation.  The  empirical  model 
employed  in  the  BLOP  code  is  based  on  experimental  and  analytical  work  done 
predominantly  at  BRL.  The  results  compare  favorably  with  available  data. 

1 .  INTRODUCTION .  This  paper  describes  a  simple  model  for  predicting- the 
blast  loading  on  box-like  structures,  and  discusses  the  results  which  are 
compared  with  available  experimental  data. 

The  Blast-Load  Prediction  (BLOP)  code  was  developed  to  quickly  and  inex¬ 
pensively  estimate  the  blast  loading  on  structures.  The  only  prediction 
method  available  prior  to  the  development  of  this  code,  is  the  standard 
prediction  technique1  which  relies  on  the  use  of  tables  and  of  rules  of  thumb 
to  estimate  the  average  pressure  load  on  a  target  surface. 

Another  method  available  is  the  hydrocode.  By  the  use  of  finite-differ¬ 
ence  techniques  it  is  possible  to  describe  the  flow  field  around  a  target  in 
detail.  But  hydrocodes  require  considerable  set-up  time  and  are  expensive  to 
operate.  Often  a  quick  estimate  of  blast  loads  is  needed  in  engineering  and 
planning  situations,  e.g.  for  a  proposed  high-explosive  (H.E.)  field  test 
where  neither  the  time,  nor  the  funds,-  nor  the  manpower  are  available  to 
carry  out  a  complex  hydrocode,  or  tedious  hand  computation.  The  BLOP  code 
was  developed  in  response  to  this  need. 

The  BLOP  model  employs  analytical  and  empirical  procedures,  the  latter 
of  which  are  based  on  experimental  work  done  previously  at  the  BRL.  A  one¬ 
dimensional  flow  scheme  is  employed  assuming  head-on  collision  of  the  shock 
with  the  target.  The  Rankine-Hugoniot  relations  are  used  to  define  the  flow 
conditions  behind  the  shock.  Three  different  flow  situations  are  considered* 
(1)  The  shock-tube  situation  is  characterized  by  a  step  shock.  (2)  The 
H.E.  field  test  situation  is  characterized  by  an  exponentially  decaying  blast 
wave.  (3)  The  simulated  blast-wave  situation  behind  the  exit  of  a  shock  tube 
is  characterized  by  a  generalized  form  of  the  modified  Friedlander  equation. 

The  average  overpressure  functions  for  the  front  and  back  faces  of 
targets  are  empirical  functions  developed  at  BRL.  The  roof  and  side  faces 
are  treated  according  to  the  standard  prediction  technique.  An  attempt  is 
made  with  the  BLOP  model  to  apply  the  simple  case  of  the  rectangular  parallel¬ 
epiped  to  a  complex  structure  as  e.g.  a  truck,  or  a  helicopter  tailboom,  sub¬ 
dividing  it  into  a  convenient  number  of  sections  each  of  which  is  represented 
by  a  rectangular  box.  Closed,  partially  open,  and  open-frame  structures  are 
considered. 
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The  code  was  written  in  FORTRAN  IV  for  use  on  the  UNIVAC  1108  at  the 
Aberdeen  Proving  Ground,  Edgewood,  Maryland.  The  code  contains  detailed  user 
instructions  and  numerous  other  comments.  It  will  be  published  as  an  appendix 
to  a  BRL  report,  which  describes  the  physical  phenomena  of  blast  waves  and  the 
target  loading  procedures  used  in  the  BLOP  model  in  greater  detail.  Here, 
they  are  reviewed  briefly  together  with  a  discussion  of  the  results. 

2.  PHYSICAL  PHENOMENA.  This  chapter  offers  a  brief  description  of  blast- 
wave  phenomena  in  as  much  detail  as  is  necessary  to  introduce  and  explain  the 
terminology  used  in  this  paper. 

2.1  Blast  Waves.  When  a  high-energy  weapon  is  detonated  at  some  height 
above  the  ground,  the  pressure  waves  emanating  from  the  center  of  explosion 
rapidly  form  a  spherical  blast  wave,  characterized  by  an  abrupt  increase  of 
the  air  pressure  across  the  shock  front.  Figure  1  illustrates  the  progress 
of  the  blast  wave  along  the  ground  surface.  As  the  incident  blast  wave 
expands,  the  shock  strength  at  the  front  decreases.  Where  the  shock  front 
contacts  the  ground  surface,  it  is  reflected. 

The  reflected  shock  front  moves  back  into  the  air  already  compressed  and 
heated  by  the  incident  shock.  Behind  the  reflected  shock,  a  new  blast  wave 
forms  with  properties  different  from  those  of  the  incident  blast  wave.  Be¬ 
cause  of  these  conditions,  the  reflected  shock  front  moves  faster  than  the 
incident  shock  front,  gradually  catches  up  with  it,  and  combines  with  it  into 
a  reinforced  shock  front  at  some  distance  from  ground  zero  (i.e.  the  reference 
point  directly  under  the  center  of  explosion) . 

This  new  shock  front  is  called  the  Mach  stem  of  the  blast  wave.  The 
Mach  stem  stands  essentially  normal,  and  moves  parallel  to  the  ground  surface. 
This  phenomenon  considerably  simplifies  the  treatment  of  blast  loading  of 
structures  located  in  the  region  of  Mach  reflection. 

An  explosion  on  the  surface  results  in  somewhat  different  air-blast 
phenomena.  The  blast  wave  forms  a  hemispherical,  reflected  shock  front  over 
the  surface.  There  is  no  region  of  regular  reflection  in  this  case,  and 
targets  on  the  ground  are  subjected  to  air-blast  conditions  similar  to  those 
in  the  Mach-ref lection  region  even  close  to  ground  zero.  The  shock  front  may 
be  assumed  to  be  vertical  for  most  purposes.  The  wind  behind  the  shock  front 
and  near  the  surface  blows  horizontally  for  all  practical  purposes. 

A  comprehensive  description  of  blast  waves  and  their  effects  on  man  and 
equipment  can  be  found  in  Reference  1.  Reference  2  contains  a  comprehensive 
collection  of  analytical  and  experimental  studies  on  the  subject  of  air-blast 
technology. 

2.2  Pressure  History.  Figure  2  gives  a  typical  overpressure  history  as 
it  may  be  recorded  at  some  spatial  location  in  the  Mach-reflection  region. 

The  origin  of  the  overpressure  and  time  axes  is  set  at  the  time  of  explosion 
t  .  When  the  shock  front  of  the  blast  wave  arrives  at  time  t&,  perhaps  a  few 

seconds  after  the  explosion,  the  pressure  increases  suddenly.  The  peak  value, 
p^^,  is  called  the  peak  shock  overpressure.  The  temperature  and  the  density 

of  the  air  suddenly  increase,  also. 
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Behind  the  shock  front,  the  overpressure  quickly  drops  to  about  one-half 
of  its  peak  value,  and  falling  steadily  returns  to  zero  at  time  t  .  This 

time  is  called  the  positive-phase  duration  because  of  the  positive  over¬ 
pressure  that  prevails.  The  positive  phase  is  followed  by  the  negative  phase 
during  which  the  overpressure  drops  below  atmospheric  pressure.  Subsequently 
it  returns  to  ambient  conditions. 

During  the  positive  phase,  strong  winds  follow  the  shock  front  giving 
rise  to  a  positive  dynamic  pressure,  qsQ .  This  dynamic  pressure  decays  with 

the  static  overpressure  but  at  a  slower  rate,  and  the  wind  continues  to  blow 
for  a  short  while  beyond  the  positive-phase  duration.  This  means  that  the 
positive  phase  of  the  dynamic  pressure  lasts  a  little  longer  than  the  positive 
phase  of  the  static  pressure. 

During  the  negative  phase,  the  wind  of  the  dynamic  pressure  reverses  its 
direction  and  blows  toward  the  center  of  the  explosion.  Some  damage  may  be 
expected  during  the  negative  phase  of  the  blast  wave  but  it  is  during  the 
positive  phase  that  most  of  the  damage  to  structures  occurs.  Therefore, 
loading  and  response  studies  are  restricted  to  the  positive  phase  of  the  blast 
wave. 


3.  COMPUTATIONAL  MODEL.  To  keep  the  computational  model  simple,  the 
following  assumptions  were  made. 

(1)  The  free-field  flow  is  essentially  one-dimensional.  This  entails 
a  shock  front  which  can  be  considered  planar  and  perpendicular  to  the 
direction  of  propagation. 

(2)  The  shock  front  will  hit  the  model  head  on,  i.e.  the  velocity  vector 
will  stand  normal  to  the  front  face.  The  shock  front  and  the  front  face  of 
the  structure  are  thus  parallel  planes. 

(3)  Empirical  equations  will  be  used  as  pressure-decay  functions  and  as 
average-load  functions  for  the  surfaces. 

(4)  Target  structures  can  be  modelled  by  a  series  of  rectangular  paral¬ 
lelepipeds.  This  assumption  is  less  restrictive  than  it  may  seem  at  first 
glance . 

The  following  load  cases  can  be  adequately  described  under  these 
assumptions . 

(1)  A  step  shock  can  simulate  the  test  conditions  in  a  shock  tube. 

(2)  An  exponentially  decaying  wave  can  simulate  the  free  field 
conditions  in  the  Mach-ref lection  region. 

(3)  A  decaying  wave  as  would  be  generated  at  the  exit  of  a  shock  tube 
can  simulate  the  special  test  conditions  in  the  field  behind  the  shock  tube. 
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3.1  Shock  Relations.  The  assumption  of  one-dimensional  flow  and  the 
restriction  to  normal  shock  incidence  allow  the  use  of  simple  analytical 
equations  like  the  Rankine-Hugoniot  relations  to  define  the  conditions  behind 
the  shock  front  and  behind  the  reflected  shock.  The  shock-front  velocity  is 
defined  by 
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where  a  is  the  sound  velocity  in  ambient  air,  and  y  is  the  ratio  of  specific 
heats.  °The  shock  strength  is  defined  by 
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where  P  is  the  ambient,  atmospheric  pressure  and  P.  the  absolute  pressure 
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The  wind  velocity  behind  the  shock-front  is  given  by 
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and  the  dynamic  pressure  is 
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For  the  definitions  of  other  shock  relations,  the  reader  is  referred  to  the 
literature . 3 


3.2  Pressure-Decay  Functions.  The  decay  of  the  blast-wave  overpressure 
at  a  fixed  target  location  is  modelled  by  the  Friedlander  equation 


p(x)  =  PSQ  (1  -  t)  e‘CT,  (5) 

where  t>  =  P,  -  P  is  the  shock  overpressure,  c  is  a  time  coefficient,  and 
rso  1  o 

t  is  the  non-dimensional  time,  defined  by 


t-t 

0  «  t  =  <1.  (6) 

+ 

The  time  coefficient,  c,  is  a  function  of  the  peak  shock  overpressure  and 
time,  and  the  model  assumes  a  linear  variation  of  c  with  t  from  an  initial 
maximum  value  to  a  final  minimum  value.  These  values  are  empirical  and  form 
part  of  the  required  input . 
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The  dynamic  pressure  decays  in  a  similar  fashion  as  the  static  pressure 
and  analogous  to  the  Friedlander  equation 

-2ct 

qCO  =  qso  ^  '  Tq-)2e 

where  the  non-dimensional  time  is  defined  by 

t  -  t 

0  <  t  =  — - -  <  1.  (8) 

q  t+ 

q 

The  peak  shock  overpressure,  Psoj  the  time  of  shock  arrival,  t^,  and  the 
positive-phase  durations,  t  for  static,  and  t  for  dynamic  pressure,  are 

tabulated  functions  of  the  range  from  ground  zero  and  are  part  of  the  required 
input . 

3.3  Average  Loading  Functions.  The  loading  model  used  in  the  BLOP  code 
is  based  on  the  Standard  Prediction  Technique  as  described  in  Reference  1. 
However,  empirical  loading  functions,  developed  at  BRL  by  Ethridge,1*  were  used 
instead  of  those  functions  used  in  the  Standard  Prediction  Technique. 

The  basic  loading  function  for  the  front  face  chosen  by  Ethridge  is 


PFR(t)  =  P 
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where  ppR  is  the  average  overpressure  on  the  front  face  at  time  t,  Pstag  is 
the  average  stagnation  overpressure  on  the  front  face  at  time  x,  pr  is  the 
normally  reflected  shock  overpressure,  and  Pstag  s  is  the  stagnation  over¬ 
pressure  immediately  behind  the  shock  front.  A(£) ,  N^,  and  B(£,x)  are 
empirical  functions  determined  by  fitting  Equation  (9)  to  experimental  data. 

The  basic  loading  function  for  the  back  face  chosen  by  Ethridge  is 

Pbk(t)  =  E  (1  -  eG)  p  (rb)  ,  (10) 

where  x,  is  non-dimensional  time  based  on  the  arrival  of  the  shock  front  at 
b 

the  back  face: 
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with  £  =  length  of  target  in  flow  direction 

Us  =  shock-front  velocity,  given  by  Equation  (1),  and 

t  =  positive-phase  duration. 

E(£)  and  G(C,x^)  are  empirical  functions  determined  by  fitting  Equation  (10) 
to  experimental  data. 

The  average-pressure  functions  used  for  the  sides  and  the  top  of  the 
target  are  those  given  by  the  Standard  Prediction  Technique.  They  are  consid¬ 
ered  to  give  an  adequate  engineering  estimate.  Since  one  cannot  predict  the 
direction  from  which  a  blast  wave  may  approach  the  target,  all  surfaces  must 
be  examined  and  designed  for  a  head-on  collision  with  the  shock  front. 

3.4  Modelling  of  Targets.  Existing  methods  for  calculating  the  airblast 
loading  on  targets  cover  only  a  few,  idealized,  simple  shapes.  These  are  (A) 
rectangular  parallelepipeds,  and  (B)  cylinders.  The  first  group  can  be 
further  divided  into 

(1)  Closed  Structures:  Structures  with  a  flat  roof  and  bearing  walls 
having  either  no,  or  only  small  openings  (amounting  to  less  than  5%  of  the 
surface  area)  fall  into  this  category,  e.g.  shelters. 

(2)  Partially  Open  Structures:  Structures  which  have  large  openings, 
or  window  areas  in  excess  of  5%  of  the  wall  area  are  classified  as  partially 
open  structures,  e.g.  houses.  Because  the  blast  wave  can  enter  these 
structures,  the  net  loading  of  any  wall  of  the  structure  is  the  difference 
between  the  interior  and  the  exterior  load. 

(3)  Open  Frame  Structures  are  those  which  have  a  supporting  steel,  or 
concrete  frame  and  nonbearing  walls,  as  e.g.  modern  office  buildings  or  truss 
structures.  The  more  significant  contribution  to  the  loading  of  these 
structures  is  made  by  the  wind  behind  the  shock  front  which  creates  a  consid¬ 
erable  drag  loading. 

An  attempt  is  made  with  the  BLOP  code  to  apply  the  simple  load  case  of 
a  rectangular  parallelepiped  to  targets  which  may  be  approximately  described 
as  an  assembly  of  several  rectangular  boxes.  Figure  3  illustrates  the 
application  of  this  concept  to  a  helicopter  tailboom.  The  target  is  sub¬ 
divided  into  a  convenient  number  of  boxes  rigidly  attached  to  each  other  such 
that  they  together  resemble  the  shape  of  the  target.  The  purpose  of  this  sub¬ 
division  is  to  accommodate  variations  of  the  incident  shock  overpressure,  time 
of  shock  arrival,  and  positive-phase  duration  along  the  major  target  axis. 

4.  DISCUSSION  OF  RESULTS.  To  evaluate  the  BLOP  code,  let  us  compare 
some  blast-loading  predictions  with  available  experimental  data  and  a  hydro¬ 
code  computation. 

4.1  Shock-Tube  Test.  The  empirical  equations  used  in  the  BLOP  code  to 
determine  the  average  pressure  on  the  front  and  back  faces  of  a  target  are 
based  on  data  obtained  from  an  experimental  investigation  of  diffraction  blast 
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loading  on  two-  and  three-dimensional  blocks.5  The  data  shown  in  Figure  4 
are  representative  of  Taylor's  test  results  and  bracket  the  pressure  range 
for  which  the  empirical  equations  were  derived. 

Figure  4a  shows  the  comparison  of  the  BLOP  computation  with  the  34.5  kPa 
(5  psi)  test  results.  The  agreement  is  good,  even  though  the  computation 
cannot  simulate  the  drop  below  the  stagnation  pressure  on  the  front  face  which 
the  experimental  data  show.  A  slight  difference  between  experiment  and  com¬ 
putation  stems  from  the  fact  that  the  shock  overpressure  in  the  test  did 
not  equal  the  nominal  value.  Figure  4b  shows  the  same  comparison  at  the 
138  kPa  (20  psi)  level.  Here,  the  experimental  data  follow  the  prediction 
very  closely  on  both  the  front  and  back  faces. 

4.2  Helicopter  Tailboom  Test.  Open-ended  shock  tubes  are  blast-wave 
generators.  It  was  found  that  the  BRL  shock  tubes  may  be  used  to  generate 
blast  waves  with  peak  shock  overpressures  from  2-20  kPa  (0.3-3  psi)  in  the 
field  behind  the  shock  tube  exit.  Targets  too  big  to  fit  into  the  shock  tube 
can  be  mounted  some  distance  beyond  the  exit,  and  off-axis  to  avoid  the  gas 
jet.  This  technique  was  successfully  used  at  BRL  to  investigate  the  dynamic 
response  to  blast  loading  of  a  helicopter  tailboom6  using  the  2.4  m  (8  ft) 
shock  tube  as  a  blast-wave  generator.  Figure  5  illustrates  the  test  set-up. 

The  blast-field  parameters  needed  for  input  in  the  BLOP  code  were  deter¬ 
mined  from  a  survey  of  the  blast  field  behind  the  shock-tube  exit.  The  com¬ 
puted  blast-wave  history  is  compared  with  the  experimentally  measured  over¬ 
pressure  history  for  a  13.4  kPa  (1.9  psi)  shock  in  Figure  6.  The  shock-tube 
generated  blast  wave  does  not  have  the  typical,  classical  shape  of  a  high- 
explosive  blast  wave  shown  in  Figure  2.  After  an  initial  exponential  decay, 
the  overpressure  reaches  a  plateau,  the  height  of  which  appears  to  depend  on 
the  distance  from  the  shock-tube  exit.  In  the  final  phase  of  the  blast  wave 
the  overpressure  decays  rapidly.  This  decay,  limiting  the  positive-phase 
duration  of  the  simulated  blast  wave,  apparently  is  caused  by  the  action  of 
rarefaction  waves  at  the  shock-tube  exit  which  quickly  equalize  the  over¬ 
pressure  in  the  exiting  gas  jet. 

In  the  experiment,  overpressures  were  measured  along  the  symmetry  line 
on  the  front  and  back  surfaces  of  the  helicopter  tailboom.  These  data  are 
compared  with  the  predicted  average  overpressure  on  the  front  face  (Figure  7a) 
and  on  the  back  face  (Figure  7b)  of  the  tailboom  resulting  from  the  13.4  kPa 
blast  wave  described  in  Figure  6.  The  predicted  average  front-face  load 
(Figure  7a)  is  too  high,  particularly  during  the  diffraction  loading.  This 
overestimation  is  most  likely  due  to  the  modelling  of  the  tailboom  into  box¬ 
like  sections  with  plane  surfaces  and  sharp  corners  while  the  real  tailboom 
has  curved  surfaces  with  rounded  corners  that  accellerate  the  pressure  relief 
from  the  sides. 

The  experimentally  measured  pressure  rise  on  the  back  face  (Figure  7b) 
is  slightly  steeper,  and  the  peak  value  of  the  overpressure  higher  than  the 
predicted  load  curve  indicates.  These  differences  are  consistent  with  those 
observed  on  the  front  face  and  are  also  due  to  the  modelling  of  the  tailboom. 
Two  other  physical  phenomena,  vortex  formation  on,  and  dynamic  response  of 
the  tailboom  may  be  influencing  the  experimental  curves.  But  on  the  whole, 
the  predicted  curve  matches  the  experimental  data  well. 
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4,3  Equipment  Shelter  on  MISER fS  BLUFF,  The  S-280  Equipment  Shelter 
was  subjected  to  airblast  during  the  MISER'S  BLUFF  test  series.7  In  Figure  8 
the  free-field,  blast-wave  history  recorded  during  the  test  is  compared  with 
the  computed  prediction.  The  comparison  shows  (a)  that  the  Friedlander 
equation  very  adequately  describes  the  pressure  decay  in  a  blast  wave,  and 
(b)  that  the  experimental  blast  wave  deviates  in  some  way  from  the  ideal  blast 
wave . 


The  front-  and  back-face  load  histories  of  the  S-280  equipment  shelter 
recorded  during  MISER'S  BLUFF  are  compared  with  the  BLOP-code  computation  in 
Figures  9a  and  9b,  respectively.  The  prediction  agrees  well  with  the 
experiment  during  the  diffraction  phase,  which  lasts  about  15  milliseconds. 

The  discrepancy  between  prediction  and  experiment  during  the  drag  phase  can  be 
explained  by  the  dynamic  response  of  the  shelter  wall  during  the  test.  The 
BLOP  model  assumes  a  rigid  wall.  There,  too,  exists  the  possibility  that  air 
leaked  into  the  shelter  under  load,  increasing  the  inside  pressure  which  the 
differential  pressure  gages  mounted  in  the  shelter  walls  used  as  a  reference, 
thus  decreasing  the  pressure  difference  to  the  outside. 

4.4  HULL  Code  Prediction,  A  3-D  HULL  computation  was  performed  for  an 
S-280  shelter  modeT^exposed  to  a  34.5  kPa  (5  psi)  step  shock  in  a  shock  tube, 
and  the  results  are  compared  with  the  BLOP-code  computation.  The  front-  and 
back-face  load  histories  are  shown  in  Figure  10a,  and  the  side-face  load 
history  is  shown  in  Figure  10b. 

The  BLOP  prediction  appears  to  "average"  the  HULL  data  points  quite  well 
during  the  diffraction  phase  (Figure  10a),  and  the  agreement  between  the 
results  of  the  two  codes  is  generally  good.  The  "ringing"  of  the  HULL  data 
on  the  front  face  is  typical  for  the  hydrocode  computation  when  artificial 
viscosity  is  not  used.  A  computation  with  artificial  viscosity  was  not 
available  as  of  this  writing. 

On  the  side  face  (Figure  10b),  the  HULL  code  results  come  closer  to 
reality  because  the  pressure  drop  due  to  the  vortex  generated  at  the  front 
edge  is  accounted  for  in  the  loading  history.  Recent,  as  yet  unpublished 
experiments  at  BRL  have  validated  this  hydrocode  computation, 

5,  CONCLUSION.  From  the  foregoing  discussion  the  following  conclusions 
can  be  drawn. 

(1)  Within  the  limitations  imposed  by  the  model,  i.e. 
a  simplistic,  1-D  flow  scheme, 
normal  shock  incidence  only, 
empirical,  average-load  functions, 

-  crude  modelling  of  structures 

it  is  possible  to  obtain  a  satisfactory  estimate  of  the  blast  loading  on  a 
variety  of  structures  and  load  situations  without  resorting  to  complicated 
numerical  methods. 
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(2)  The  BLOP  code  provides  such  estimates  over  a  reasonable  shock-over¬ 
pressure  range  (1-400  kPa)  with  short  set-up  time  at  minimal  expense.  The  cost 
involved  in  running  BLOP  on  a  digital  computer  (e.g.  UNIVAC  1108)  is  less  than 
1%  of  the  cost  of  a  hydrocode  run,  and  therefore  very  suitable  for  parametric 
studies . 

(3)  The  computational  model  is  expandable  to  improve  existing  loading 
functions  and  include  loading  functions  for  other  generic  (e.g.  axisymmetric) 
shapes  and  for  oblique  shock  impact  and  reflection. 
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Figure  2.  Pressure  History  of  a  Blast  Wave, 
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Figure  4.  Comparison  of  BLOP-Code  Prediction  with  Shock-Tube  Data. 
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Figure  5.  Blast-Wave  Simulation  Technique  Using  the  BRL  2.4  m  Shock  Tube. 
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Figure  6.  Comparison  of  the  Predicted  and  Measured  Simulated-Blast-Wave  History  for 
a  Peak  Overpressure  of  13.4  kPa  (1.9  psi) . 
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Figure  7.  Comparison  of  the  Predicted  and  Measured  Blast-Loading  History  for 
a  13.4  kPa  (1.9  psi)  Simulated  Blast  Wave. 
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Figure  7.  Comparison  of  the  Predicted  and  Measured  Blast-Loading  History  for 
a  13.4  kPa  Cl -9  psi)  Simulated  Blast  Wave. 
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Figure  8.  Free-Field  Blast-Wave  History  Compared  With  BLOP-Code  Computation 
for  a  Peak  Shock  Overpressure  of  42  kPa  (6.1  psi) . 
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Figure  10,  Comparison  of  Blast- Loading  Computations  by  BLOP  and  HULL  codes. 
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Figure  10.  Comparison  of  Blast-Loading  Computations  by  BLOP  and  HULL  codes. 
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ABSTRACT.  Front  tracking  allows  greatly  increased  resolution  and 
accuracy  for  fluid  flow  problems  dominated  by  discontinuities.  Progress 
is  reported  here  on  the  upgrading  of  previous  calculations  [2,3].  The 
long  range  goal  is  a  conveniently  useable  package  which  is  coherently 
structured  and  applicable  to  a  broad  range  of  problems. 

1.  INTRODUCTION.  In  our  previous  report  [2,3],  calculations  using 
front  tracking  methods  were  reported.  The  calculations  were  performed 
in  the  context  of  petroleum  reservoirs,  for  which  the  relevent  equations 
are  a  coupled  system  of  elliptic  and  hyperbolic  equations: 

(1)  v  =  -k(s) Vp 

(2)  V*v  -  source  terms 

(3)  s t  +  v*Vf(s)  =  source  terms 

Here  p  =  p(x,y,t)  is  the  pressure  and  s  =  s(x,y,t)  is  the  saturation. 

The  calculations  tested  the  concepts  of  front  tracking  in  a  region  of 
parameters  for  which  the  problem  is  unstable  and  very  difficult  to 
compute.  The  calculations  were  checked  internally  for  numerical 
consistency  (for  example,  by  testing  for  grid  orientation  effects  and 
for  convergence  under  mesh  refinement).  They  were  also  checked  against 
experimental  data.  The  calculations  were  performed  on  a  coarse  grid, 
and  appear  to  represent  a  new  capability  within  computational  fluid 
dynamics,  which  may  be  helpful  for  a  broad  range  of  problems. 

Recently,  progress  has  centered  on  upgrading  the  capability  of 
the  calculations  in  several  respects* 

2 .  NEW  PHYS I CS ,  Previous  one  dimensional  calculations  in  gas 
dynamTcs"  [1]  are  the  starting  point  for  a  two  dimensional  gas  dynamics 
front  tracking  calculation.  The  main  constructive  step  is  the  solution 
of  the  Riemann  problem.  This  has  now  been  installed  in  the  two 
dimensional  code  and  is  undergoing  preliminary  tests.  Special  code 
(e,g.  for  reflection  of  waves  at  boundaries)  has  yet  to  be  added. 

One  problem  on  which  this  method  will  be  tested  is  the  transient  flow 
past  an  object  (wing  foil),  or  through  a  tube  of  variable  cross  section. 

3.  NEW  GEOMETRY.  Arbitrary  fronts  in  two  dimensions,  including 
disconnected  components  and  self  intersections  are  allowed  within  the 
framework  of  the  calculation's  data  structure.  This  is  important  because 
self  intersections  may  occur  dynamically  within  a  problem  which  originally 
may  have  had  a  very  simple  front*  Also  bifurcations  can  lead  to  changes 
of  topology  at  the  self  intersection  points.  Examples  are  droplet 
formation  and  mach  stem  reflections-  Thus  it  is  important  to  have  a 
computat iona 1  data  structure  which  allows  these  events  to  occur  with  a 
minimum  of  special  coding. 
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4.  ELLIPTIC  PROBLEMS  IN  DISCONTINUOUS  MATERIALS.  Elliptic 
problems  with  discontinuous  coefficients  occur  in  a  wide  range 

of  physical  problems  -  for  example  in  incompressible  fluid  flows. 

If  the  location  of  the  discontinuity  curve  is  known  accurately, 
then  it  is  possible  to  attempt  a  more  accurate  solution  than 
would  normally  be  possible.  An  elliptic  solver  has  been  de¬ 
veloped  recently  for  media  with  an  irregular  material  interface  , 
and  uses  a  mesh  alignment  algorithm  to  fit  the  known  discontinuity 
curve,  O.McBryan  [5].  The  main  idea  is  to  construct  a  grid  by 
triangulation  of  the  domain  in  such  a  way  that  each  triangle  lies 
entirely  on  one  side  of  the  interface.  The  grid  is  a  deformation 
of  a  regular  rectangular  grid  and  is  in  fact  rectangular  away 
from  the  interface.  The  equations  are  then  solved  using  finite 
elements  on  this  triangulation.  The  resulting  linear  equations 
can  be  solved  efficiently  because  the  matrix  is  very  similar  to 
a  regular  finite  difference  operator. 

5.  INTERFACE  PACKAGE .  Complex  material  topologies  and 
interfaces  occur  in  a  wide  range  of  problems.  Work  has  begun  on 
developing  a  subroutine  package  for  manipulating  such  interfaces. 
The  package  allows  for  arbitrarily  complex  topologies  and 
geometries  and  is  designed  to  minimize  the  programming  effort 
involved  in  coding  interfaces.  High  level  primitive  operations 
such  as  adding  curves  to  an  interface  or  making  a  copy  of  an 
interface  hide  the  underlying  data  structures  which  have  been 
designed  to  provide  efficient  access  to  topological  information  - 
such  as  which  component  of  a  domain  a  given  point  lies  in.  This 
code  will  be  used  in  both  the  shock— tracking  codes  and  the 
elliptic  codes  referred  to  previously.  Eventually  the  package 
will  be  extended  to  handle  three-dimensional  interface  surfaces. 

6.  STRUCTURED  DESIGN.  The  front  tracking  and  mesh 
alignment  codes  described  previously  are  large  and  complex  pieces 
of  software.  A  major  effort  is  underway  to  ensure  that  these 
codes  can  be  applied  to  new  problems  with  a  minimum  of  pro¬ 
gramming  effort.  Principles  of  structured  programming  are  used 
throughout  and  all  physics  or  geometry  dependant  routines  have 
been  isolated.  Thus  the  tracking  code  can  be  used  as  a  package 
and  easily  applied  to  other  problems.  All  that  is  required  is  a 
main  driver  routine  and  the  provision  of  a  set  of  physics 
dependant  routines  -  for  example  a  Riemann  solver  for  a  hyperbolic 
conservation  law.  Similarly  the  elliptic  code  is  modularized 

and  requires  only  a  few  problem-specific  routines  such  as  those 
to  define  the  coefficient  functions  and  boundary  data.  Lower- 
level  modules  such  as  a  general  purpose  storage  allocator  and  a 
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debugging  package  are  also  of  more  general  use.  Supporting 
graphics  programs  have  been  designed  with  a  device  and 
system  independence.  Thus  the  same  program  can  generate  a 
Tektronix  plot  on  a  Vax  11/780  or  a  movie  on  a  CDC6600. 
Further  developments,  such  as  three-dimensional  and  colour 
graphics,  will  be  needed  for  the  effective  interpretation  of 
more  complex  codes . 
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ABSTRACT.  A  numerical  model  for  computing  the  vertically  averaged 
hydrodynamics  of  a  water  body,  including  salinity  effects,  has  been  developed. 
The  model  employs  the  concept  of  boundary  fitted  coordinates  to  allow  for 
an  accurate  representation  of  the  boundary  of  the  region  being  modeled 
while  retaining  the  simplicity  of  the  finite  difference  method  of  solution. 
Although  a  general  curvilinear  coordinate  system  covers  the  physical 
domain,  all  computations  to  solve  the  governing  fluid  dynamic  equations, 
as  well  as  the  computation  of  the  boundary  fitted  coordinate  system,  are 
performed  in  a  transformed  rectangular  plane  with  square  grid  spacing. 

A  combination  implicit-explicit  finite  difference  scheme  has  been 
employed  to  numerically  solve  the  governing  equations.  With  such  a  scheme, 
the  water  surface  elevation  is  computed  implicitly  using  the  Accelerated 
Gauss-Seidel  solution  technique;  whereas,  the  velocity  and  salinity  fields 
are  solved  in  an  explicit  manner.  The  major  advantage  of  such  a  scheme 
is  that  the  speed  of  a  surface  gravity  wave  is  removed  from  the  stability 
criteria  while  many  desirable  features  of  an  explicit  scheme  are  retained. 

Although  additional  work  remains  to  be  completed  before  the  model 
can  be  considered  fully  operational,  preliminary  results  demonstrate  that 
the  basic  model  behaves  properly, 

1.  INTRODUCTION .  Since  the  equations  governing  the  motion  of  fluids 
are  nonlinear,  analytic  solutions  in  general  cannot  be  found  and  one  is 
forced  to  resort  to  numerical  techniques  to  obtain  solutions.  The  two 
most  common  such  techniques  are  the  finite  difference  method  (FDM)  and 
the  finite  element  method  (FEM) .  There  are,  of  course,  both  advantages 
and  disadvantages  to  each  of  these  approaches. 

Perhaps  the  most  often  quoted  advantage  of  the  finite  element  method 
is  that  with  this  approach  physical  boundaries  coincide  with  computational 
net  points.  Therefore,  the  modeling  of  flow  within  an  irregular  domain 
can  be  more  accurately  handled  than  with  the  normal  finite  difference 
method  where  the  approach  is  to  construct  a  rectangular  grid  over  the 
domain,  which  forces  the  boundaries  to  be  represented  in  a  "stair  stepped" 
fashion.  However,  a  disadvantage  of  finite  element  methods  is  that  they 
involve  dense  matrices  rather  than  the  sparse  matrices  involved  in  finite 
difference  methods.  This  results  in  more  computational  time  being  required 
in  a  finite  element  model  having  the  same  number  of  mesh  points  as  a  finite 
difference  model.  An  additional  disadvantage  is  that  the  finite  element 
method  is  more  cumbersome  to  code  into  a  computer  model  than  the  finite 
difference  method.  This  can  be  a  problem  not  only  during  the  development  of 
the  model  but  can  also  increase  the  level  of  effort  required  during  later 
model  modifications. 


169 


Accepting  that  the  finite  difference  method  possesses  an  advantage 
in  simplicity  and  perhaps  computational  costs,  a  logical  question  is  whether 
or  not  one  can  develop  ways  to  circumvent  the  major  disadvantage  of  having 
to  represent  irregular  boundaries  in  a  "stair  stepped"  fashion.  One  such 
technique  which  has  been  developed  by  Thompson,  et  al.^»^»^  involves  the 
use  of  boundary-fitted  coordinates,  Thompson's  method  generates  curvi¬ 
linear  coordinates  as  the  solution  of  two  elliptic  partial  differential 
equations  with  Dirichlet  boundary  conditions,  one  coordinate  being  specified 
to  be  constant  on  the  boundaries,  and  a  distribution  of  the  other  specified 
along  the  boundaries.  However,  the  numerical  computations  to  solve  the 
governing  flow  equations,  as  well  as  computations  for  the  solution  of  the 
coordinate  system,  are  not  made  in  the  physical  curvilinear  coordinate 
system  but  rather  are  made  on  a  rectangular  grid  with  square  mesh  spacing. 

The  mathematical  modeling  of  the  hydrodynamics  of  a  body  of  water 
plus  the  transport  and  dispersion  of  a  conservative  constituent  within 
that  body  involves  the  solution  of  a  set  of  partial  differential  equations 
expressing  the  conservation  of  mass,  momentum,  and  energy  of  the  flow 
field  along  with  a  transport  equation  for  the  constituent.  These  equations 
involve  derivatives  with  respect  to  time  as  well  as  three  spatial  dimensions. 
However,  a  simplification  that  is  often  made  in  treating  relatively  shallow 
bodies  of  water  that  are  well  mixed  over  the  depth  is  to  vertically  average 
the  three-dimensional  (3D)  equations  to  yield  a  two-dimensional  (2D)  set 
for  nearly  horizontal  flows. 

Since  the  early  to  mid  1960's,  many  finite  difference,  plus  a  few 
finite  element,  computational  models  for  vertically  averaged  flows  have 
been  developed, 4,5, 6, 7  The  purpose  of  this  paper  is  to  describe  the 
development  of  a  new  vertically  averaged  hydrodynamic  model  which  is 
fully  coupled  with  the  water  salinity  through  its  influence  on  the 
water  density.  The  finite  difference  method  of  solution  is  employed 
but,  unlike  the  previously  developed  models,  solutions  are  obtained  on 
a  boundary-fitted  coordinate  system  to  provide  an  accurate  representation 
of  boundary  geometry, 

2,  BASIC  HYDRODYNAMIC  EQUATIONS,  The  Navier  Stokes  equations  are 
the  basic  governing  equations  for  the  solution  of  fluid  dynamic  problems 
and  express  the  conservation  of  mass  and  momentum  of  the  flow  field.  In 
addition,  for  problems  in  which  salinity  effect  are  important,  a  separate 
conservation  of  mass  equation  must  also  be  written  for  the  salinity  along 
with  an  equation  of  state  relating  the  water  density  to  the  salt  concentra¬ 
tion  and  the  water  temperature,  With  the  closure  of  such  a  system,  there 
exist  six  equations  to  be  solved  for  the  six  unknowns;  density  -  p, 
three  velocity  components  -  u,  v,  w,  pressure  -  p,  and  salinity  -  s. 

After  temporally  as  well  as  vertically  averaging  the  equations 
discussed  above,  the  final  form  of  the  governing  equations  in  Cartesian 
coordinates  is: 
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Continuity: 


y -.momentum : 


3(hv)  9  (huv)  3  (hv2) 

3t  +  3x  +  3y 
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Equation  of  State:  p  =  p(s,T) 


where  <j>  =  water  surface  elevation, 
h  =  water  depth 
u,v  =  velocity  components 
=  atmospheric  pressure 

p  =  water  density 

D  D  ,  D  ,  D  =  eddy  viscosity  coefficients 
xx,  xy  yx’  yy  J 

v  =  wind  speed 
w  r 
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ot  =  wind  direction 

f  =  Coriolis  parameter 

g  -  acceleration  of  gravity 

s  «  salt  concentration 

T  =  water  temperature 

E  ,  E  =  eddy  diffusivity  coefficients 


A  discussion  of  the  development  of  these  equations  can  be  found  in 
reference  8, 


The  above  set  of  equations  must  now  be  transformed  into  a  (£>n) 
boundary^f itted  coordinate  system  such  that  are  the  independent 

variables.  The  resulting  set  of  equations  will  then  be  solved  in  a 
transformed  rectangular  plane  as  previously  discussed,  In  order  to 
accomplish  the  transformation 9  the  following  expressions  are 
utilized . 


fx  ■  jK’e  ■  “Vn] 
fy  '  T  ['  (fV5  *  <fVn] 


It  should  be  noted  that  these  expressions  are  written  in  a  fully  conserva¬ 
tive  form  which  should  result  in  a  more  accurate  solution  in  highly 
irregular  coordinate  systems.  For  brevity,  the  transformed  set  of  equations 
are  not  presented.  For  the  more  interested  reader,  they  are  presented 
in  reference  8,  Obviously,  the  transformed  equations  are  more  complicated 
than  the  Cartesian  form  presented  as  equations  1-5;  however,  the  advantage 
of  being  able  to  make  computations  on  a  rectangular  grid  far  outweighs 
any  disadvantage  resulting  from  the  more  complicated  set  of  equations. 

3.  NUMERICAL  ASPECTS.  In  order  to  obtain  a  solution  of  the 
governing  set  of  transformed  equations,  the  method  of  finite  differences 
is  employed,  There  are  many  different  types  of  finite  difference  schemes 
that  have  been  employed  in  numerical  solutions  of  partial  differential 
equations.  These  schemes  range  from  fully  explicit  to  fully  implicit, 
with  a  combination  of  an  explicit -imp licit  scheme  being  employed  in  some 
cases,  e,g, ,  Edinger  and  Buchak,9  A  similar  scheme  is  employed  here. 
Basically,  the  computational  cycle  will  consist  of  the  following  steps, 

a.  Solve  for  the  water  surface  from  the  continuity  equation  in 

a  fully  implicit  fashion  using  the  Accelerated  Gauss-Seidel  technique, 

b.  Using  the  most  recent  values  of  the  water  surface  elevations, 
solve  for  the  u  and  v  velocity  components  from  the  x  and  y 
momentum  equations  in  an  explicit  fashion. 
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£.  Solve  for  the  salinity  from  the  salt  transport  equation  in 
an  explicit  fashion. 

d.  Compute  the  density  from  the  equation  of  state,  using  the  most 
recently  computed  salinity  field. 

£.  Step  forward  in  time  and  repeat  the  sequence. 

Such  a  scheme  as  outlined  above  will  have  the  stability  criterion 
associated  with  the  speed  of  a  free  surface  gravity  wave  removed; 
although,  diffusive  criteria  as  well  as  the  Torrence  condition  associated 
with  the  speed  of  a  water  particle  remain.  However,  these  criteria  are 
not  normally  over  restrictive. 

The  grid  upon  which  the  governing  equations  are  solved  is  rectangular 
with  a  grid  spacing  of  A?  -  Aq  =  1,  The  u  and  v  velocity  components 
are  computed  at  the  corners  of  each  cell  with  the  water  surface  elevation, 
salinity,  and  density  computed  at  the  center  of  a  cell.  The  (x,y) 
coordinates  are  specified  at  the  corners,  the  center,  and  also  at  the 
midpoint  of  each  side  of  a  cell. 

The  basic  difference  equations  are  developed  using  forward  differences 
for  all  time  derivatives.  Centered  differences  are  used  in  all  spatial 
derivatives  except  in  the  convective  terms  where  one  has  the  option  in 
the  computer  model  (called  VAHM  for  Vertically  Averaged  Hydrodynamic 
Model)  of  requesting  the  use  of  either  centered  or  a  form  of  Roache's 
second  upwind  differencing, 

4,  BOUNDARY  CONDITIONS,  Three  types  of  boundaries  are  allowed  in 
VAHM;  walls,  oceans,  and  rivers.  Wall  boundaries  are  characterized  by 
the  specification  of  a  no-slip  condition,  i.e.,  the  velocity  components 
u  and  v  are  set  to  be  zero  at  walls.  Although,  physically,  the  flow 
must  be  zero  at  a  solid  boundary,  slip  conditions  on  the  velocity  at  a 
wall  often  give  more  realistic  results  if  the  grid  spacing  is  too  large 
near  the  wall.  Slip  conditions  would  be  implemented  by  setting  the  normal 
component  of  the  velocity  equal  to  zero  with  the  tangential  component 
computed  from  the  expression  for  zero  vorticity.  At  the  present  time, 
only  the  no-slip  condition  is  allowed  in  VAHM. 

Ocean  boundaries  are  characterized  by  the  specification  of  a  time 
varying  water  surface  elevation  at  the  boundary.  Velocities  on  the 
ocean  boundary  are  then  computed  from  a  simplified  form  of  the  momentum 
equation  where  the  diffusive  terms  have  been  neglected,  One-sided 
differences  are  used  to  replace  derivatives  that  need  points  outside 
the  field. 

When  the  flow  is  directed  into  the  computational  field,  the  boundary 
condition  on  the  salinity  is  prescribed  as  that  of  the  ocean.  However, 
when  the  flow  is  moving  out  of  the  computational  field,  the  salinity 
at  an  ocean  boundary  is  set  to  be  equal  to  its  value  at  the  next  point 
inside , 


173 


River  boundaries  are  characterized  by  the  specification  of  the 
velocity.  The  salinity  is  set  to  be  zero  and  the  water  surface  elevation 
at  the  center  of  a  river  boundary  cell  is  computed  as  in  any  interior 
cell. 


5.  MODEL  APPLICATION.  In  order  to  demonstrate  the  versatility  of 
VAHM  in  its  ability  to  model  flows  in  rather  general  multiply-connected 
regions  containing  both  river  and  ocean  boundaries,  an  application  has 
been  made  using  the  physical  geometry  in  Figure  1, 

The  first  step  in  the  application  of  VAHM  is  the  generation  of 
the  boundary- fit ted  coordinates.  This  is  accomplished  through  a  coordinate 
generation  code  developed  by  Thompson.  Output  from  the  coordinate  code 
is  saved  on  a  file  for  subsequent  use  by  VAHM.  The  basic  input  to  the 
coordinate  code  is  the  specification  of  the  (x,y)  coordinates  of  the 
boundary  points  noted  on  Figure  1.  Although  various  degrees  of  coordinate 
control  can  be  exercised,  the  boundary-fitted  coordinates  shown  in 
Figure  1  were  computed  using  no  control.  The  coordinate  system  plotted 
was  the  the  third  attempt  at  generating  a  useful  grid  system.  Through 
the  movement  of  boundary  points  and/or  coordinate  control  one  attempts 
to  compute  boundary-fitted  coordinates  such  that  the  grid  spacing  does 
not  vary  rapidly  and  such  that  (£,n)  lines  never  approach  being  parallel 
to  each  other.  The  coordinate  system  presented  satisfies  both  of  these 
criteria  and  thus  is  considered  to  be  adequate. 

For  the  geometry  shown  in  Figure  1,  an  application,  in  which  a 
river  boundary  is  assumed  at  the  top  with  an  ocean  on  the  bottom, 
has  been  made,  A  constant  velocity  of  0,4  m/s  and  a  zero  salinity 
concentration  were  assumed  at  the  river  boundary  while  the  tide  curve 
presented  in  Figure  2,  and  an  ocean  salinity  concentration  of  30  ppt 
was  prescribed  at  the  ocean  boundary.  The  initial  depth  was  set  to  be 
11,0  m  throughout  the  system  with  the  initial  velocity  and  salinity 
fields  set  to  zero.  Values  of  various  parameters  were  prescribed  by 
setting  the  diagonal  components  of  the  eddy  viscosity  tensor  to  . 

10  m^/s  ,  setting  the  Chezy  coefficient  for  bottom  friction  to  35  /s 
and  employing  a  computational  time  step  of  600  sec, 

Figures  3-9  present  "snap  shots"  of  the  computed  flow  field  at 
various  times.  The  influence  of  first  the  flood  and  then  the  ebb  portion 
of  the  tide  can  be  clearly  seen.  In  addition  to  velocity  vector  plots, 
one  can  also  consider  the  time  history  of  the  water  surface  elevation  as 
well  as  the  salinity  at  particular  points  in  the  systems.  Figures  10 
and  11  are  examples. 

6.  SUMMARY .  A  numerical  model  for  computing  vertically  averaged 
velocities  and  salinity  plus  water  surface  elevations  has  been  developed. 

By  employing  the  concept  of  boundary-fitted  coordinates,  irregular 
boundaries  can  be  accurately  modeled  in  either  simply  or  multiply- 
connected  regions.  Even  though  the  numerical  grid  is  a  nonorthogonal 
curvilinear  grid  in  the  physical  region  being  modeled,  all  numerical 
computations  are  carried  out  in  a  transformed  rectangular  grid  with 
square  grid  spacing. 
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A  feature  of  the  model  is  the  particular  solution  technique  employed 
to  numerically  solve  the  governing  equations.  A  combination  implicit- 
explicit  finite  difference  scheme,  patterned  after  work  by  Edinger  and 
Buchak^  in  their  development  of  a  laterally  averaged  reservoir  hydro- 
dynamic  model ,  has  been  developed  to  remove  the  speed  of  a  gravity  wave 
from  stability  restrictions  on  the  computation  time  step  while  still 
retaining  some  of  the  advantages  of  explicit  schemes.  With  such  a  scheme, 
the  water  surface  elevation  is  computed  implicitly  using  the  Accelerated 
Gauss-Seidel  solution  technique  while  the  velocities  and  salinity  are 
computed  in  an  explicit  fashion. 

The  model  has  been  developed  for  general  applications.  Any  number 
of  river  and/or  ocean  boudaries  can  be  arbitrarily  located  on  the  trans¬ 
formed  rectangular  plane,  as  can  the  placement  of  islands  in  the  interior 
of  the  computation  field.  Even  though  a  great  deal  of  generality  exist, 
there  are  restrictions.  For  example,  only  no-slip  boundary  conditions 
are  currently  treated  at  solid  boundaries  and  no  flooding  of  those 
boundaries  is  allowed;  however,  work  on  removing  these  restrictions 
is  ongoing. 

Although  VAHM  has  been  developed  to  the  point  where  results  from 
the  test  application  presented  are  encouraging,  additional  work  is  needed 
before  VAHM  can  be  considered  fully  operational. 
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Figure  2-  Water  surface  elevation  at  ocean  boundary 


178 


Figure  3 
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Figure  4. 
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Figure  5 
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Figure  6 
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Figure  7. 
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THE  GEM  CODE:  DIRECT  SOLUTIONS  OF  ELLIPTIC 
AND  MIXED  PROBLEMS  WITH  NON-SEPARABLE 
5-  AND  9-POINT  OPERATORS1 

Patrick  J.  Roache 

Ecodynamics  Research  Associates,  Inc. 

P.  0.  Box  8172 

Albuquerque,  New  Mexico  87198 


ABSTRACT.  Timing  and  accuracy  tests  of  the  GEM  (General  Elliptic  Marching) 
Code  are  described.  The  GEM  Code  solves  elliptic  and  mixed  discretized  two- 
dimensional  partial  differential  equations  by  direct  (non-iterative)  spatial 
marching  methods.  Both  5-point  and  9-point  stencils  may  be  solved,  with  no 
requirement  that  the  coefficients  be  separable.  Repeat  solutions  of  5-point 
operators  are  solved  in  a  CPU  time  equivalent  of  2  SOR  iterations.  The  basic 
GEM  depends  on  problem  parameters  (primarily  a  large  cell  aspect  ratio  ax/ Ay) 
to  control  the  instability  incurred  in  marching  elliptic  equations.  A  stabi¬ 
lized  version  uses  the  basic  GEM  in  a  multiple  patching  scheme  to  solve  larger 
problems. 


1.  INTRODUCTION.  Elliptic  equations  with  non-separable  coefficients 
arise  in  a  variety  of  applications.  Even  the  simple  Poisson  equation  becomes 
non-separable  when  written  in  general  non-orthogonal  coordinates.  The  simplest 
second-order  finite  difference  discretization  then  leads  to  a  9-point  non- 
separable  stencil. 

Such  problems  are  not  solvable  by  fast  direct  methods  such  as  odd-even 
reduction,  Hackney's  method,  etc.  Direct  solution  by  brute-force  banded 
Gaussian  elimination  is  very  expensive  and  limited  in  problem  size  by  round¬ 
off  error  and  storage. 

Iterative  methods  are  most  often  used  for  such  problems,  and  multigrid 
methods  in  particular  can  be  very  effective.  However,  any  iterative  method 
depends  on  the  effectiveness  of  the  smoothing  operator  which  depends  on  dia¬ 
gonal  dominance.  This  deteriorates  with  the  addition  of  first-derivative 
terms,  either  from  the  physical  laws  (e.g.  convective  terms)  or  from  a  non- 
orthogonal  coordinate  transformation.  Some  iterative  methods  (ADI)  fail 
completely  on  even  a  simple  problem  like  the  Poisson  equation  in  cartesian 
coordinates  with  a  large  cell  aspect  ratio  Ax/Ay. 

Marching  methods  are  at  present  the  only  fast  direct  method  of  solving 
such  problems.  The  GEM  Code  is  a  user-oriented  package  of  subroutines  which 
implement  the  marching  methods  for  a  fairly  wide  class  of  two-dimensional 
problems.  This  paper  describes  the  results  of  timing  and  accuracy  tests  on 
the  GEM  (General  Elliptic  Marching)  Code. 


1  Research  sponsored  by  the  U.S.  Army  Research  Office. 
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2.  THE  GEM  CODE.  The  GEM  Code  solves  elliptic  and  mixed  discretized 
two-dimensional  partial  differential  equations  by  direct  (non-iterative)  spa¬ 
tial  marching  methods.  Both  5-point  and  9-point  stencils  may  be  solved,  with 
no  requirement  that  the  coefficients  be  separable.  For  example,  it  solves 
the  usual  second-order  accurate  discretization  of 

aF  +  bF  +  cF  +  dF  +  eF  +  fF  =  g  (1) 

xx  yy  x  y  xy  a  v  ' 

where  a,b,...g  are  all  functions  of  x  and  y.  The  methods  used  in  the  GEM 
Code  are  described  in  detail  in  (1).  The  basic  code  is  based  on  "simple 
marching"  and  depends  on  problem  parameters  to  control  the  instability  incur¬ 
red  in  marching  elliptic  equations;  for  realistic  physical  problems,  this 
primarily  depends  on  a  large  cell  aspect  ratio  Ax/Ay. 

Operation  counts  e  are  given  in  some  detail. in  (1).  For  a  simple 
Poisson  equation  (5-point  operator)  without  making  use  of  symmetry,  in  a 
square  array,  this  gives 


initiation 


4M3  +  |  M2  ,  e 


repeat 


14M2 


(2) 


The  initiation  count  is  less  than  that  required  to  establish  a  single  solu¬ 
tion  by  point  SOR,  and  the  repeat  count  is  less  than  2  point  SOR  iterations. 
Since  operation  counts  like  these  neglect  many  overhead  and  subscripting 
operations,  it  is  necessary  to  validate  them  with  actual  timing  tests, 
especially  since  Equation  (2)  indicates  such  remarkable  efficiency. 


When  a  9-point  operator  is  used,  the  marching  solution  proceeds  a  line 
at  a  time  (like  line  SOR)  and  requires  a  tri-diagonal  solution  at  j  +  1  at 
each  step  in  the  march.  This  of  course  increases  the  operation  counts,  but 
not  their  order  (i.e.  repeat  solutions  are  still  optimal,  with  e  =  M2). 

For  other  aspects  of  the  method,  see  (1). 


3.  PROBLEM  DESCRIPTION  IN  THE  GEM  CODE.  The  code  is  written  with  a 
"smart  user"  in  mind,  i.e.  one  who  knows  both  finite  differences  and  FORTRAN. 
The  discretization  of  the  continuum  partial  differential  equation  is  left  to 
the  user.  The  code  is  written  in  Fortran  IV,  and  the  subroutine  GEM  solves 
the  stencil 


7 


All  the  coefficients  C^.Cg 


C5  C6 
C8  C9 


■  C, 


F.  .  = 


'10 


(3) 


IP  are  arrays  stored  in  the  labeled  COMMON 

block  GEMCOM.  (The  smart  user  could  change  some  or  all  of  these  to  BLANK 
COMMON  for  storage  efficiency.) 


190 


The  Subroutine  Call  is  of  the  following  form. 

CALL  GEM( INIT,F, IL,JL, ILD, N59,IPER, ICOR, 

NRC,  RCOND, JMAR, JBOT ,JT0P ,NDBC,FDBC,  (4) 

IPVT,CI ,KLD,NC10 ,EMX) 

INIT  =  0  initiates  only,  =  1  initiates  and  solves,  >1  backsolves  only.  The 
solution  is  stored  in  F.  The  problem  size  is  ILxJL,  with  the  actual  first 
DIMENSION  of  the  arrays  being  ILD.  N59  =  5  or  9  gives  the  5-point  or  9- 
point  operator  solution.  (If  N59  =  5,  the  corner  coefficients  C^ * C3 , C7  and 
Cg  are  ignored.)  For  IPER  =  1,  periodic  boundary  conditions  are  used  in  the 
x-direction  (normal  to  the  marching  direction  y).  ICOR  is  the  number  of 
corrective  clean-up  iterations  used  to  reduce  round-off  error  accumulation; 
usually,  ICOR  =  0  is  used,  but  in  some  cases  of  marginal  stability,  ICOR  =  1 
or  2  may  be  used. 

The  LU  decomposition  and  back-solve  of  the  influence  coefficient  matrix 
is  done  through  LINPACK  subroutines  (2)  which  are  selected  by  the  option  indi¬ 
cator  NRCOND.  For  NRCOND  =  1,  LINPACK  routines  are  used  which  give  an  esti¬ 
mate  of  the  inverse  of  the  condition  number  RCOND.  The  time  penalty  is  small, 
and  in  the  author's  experience,  RCOND  has  been  valuable  as  a  debugging  aid. 

JMAR  is  an  option  indicator  for  the  march  direction,  with  ±1  giving  a 
march  in  the  +J  or  ~J  direction,  respectively.  This  is  a  significant  option 
because  the  stability  of  the  marching  method  is  directional.  For  an  expanding 
coordinate  system  (typical  of  turbulent  boundary  layer  calculations,  for 
example)  the  stability  is  improved  if  the  march  proceeds  from  the  coarse  mesh 
to  the  fine  mesh. 

The  next  four  arguments  are  primarily  of  use  when  GEM  is  driven  by  another 
code  GEMPAT2  which  stabilizes  the  solution  by  patching  subregions  together. 
Without  stabilizing,  JBOT  =  1,  JTOP  =  JL,  NDBC  =  0  and  FDBC  is  ignored.  In 
the  stabilizing  code,  JBOT  and  JTOP  define  the  extent  of  the  subregion  being 
solved,  and  NDBC  =  1  indicates  that  the  solution  along  the  patching  line  has 
Dirichlet  boundary  conditions  defined  in  the  vector  FDBC(IL). 

IPVT  and  Cl  are  work  arrays,  dimensioned  I  PVT (ILD)  and  CI(KLD,KLD)  where 
KLD  >.  IL  -  2. 

NC10  is  another  option  indicator  primarily  used  when  patching  subroutines 
together.  For  NC10  =  0,  the  homogeneous  problem  C10(I,J)  =  0  is  solved,  re¬ 
gardless  of  the  values  stored  in  CIO. 

Finally,  the  variable  EMX  is  the  output  value  of  the  maximum  error  in  the 
solution,  which  occurs  at  the  end  of  the  march.  A  significant  advantage  of 
the  marching  method  is  that  it  will  not  lie  to  the  user.  The  finite  differ¬ 
ence  stencil  is  satisfied  virtually  to  single  precision  everywhere  except  at 
the  end  of  the  march.  The  solution  obtained  can  be  viewed  as  a  virtually  exact 
solution  of  a  problem  with  a  boundary  condition  perturbed  by  EMX. 

Boundary  conditions  are  also  specified  by  the  coefficients  C^  -  Cio>  as 
indicated  by  the  following  stencils. 
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c  c  c 

L4  L5  o 


C 

C 


5 

8 


c  c 

l4  u5 


(5) 


C5  C6 


C4  C5  C6 


C 

C 


2 

8 


For  example,  this  stencil  indicates  that  in  the  lower  left-hand  corner,  at 
i  =  1  and  j  -  1,  the  boundary  condition  is 

C2(l, 1)*F( 1,2)  +  C5(1,1)*F(1,1) 

+  C6( 1,1)*F(2,1)  =  C10(l,l) 

The  general  form  of  Equation  (5)  allows  for  all  linear  combinations  of 
boundary  conditions  such  as  Dirichlet,  Neuman,  mixed,  ratio  of  derivatives 
(afx  +  bfy  =  c),  etc.  However,  the  requirement  for  separability  of  boun¬ 
dary  conditions  in  the  marching  y-di recti  on  dictates  that  C2  cannot  be  used 
at  the  side  boundaries.  In  x,  the  periodic  option  indicator  I  PER  =  1  over¬ 
rides  the  matrix  specification  in  Equation  (5).  For  the  9-point  operator, 
the  periodic  tri diagonal  solution  is  obtained  by  the  method  of  Reference  (3). 

The  code  is  written  so  that  none  of  the  arrays  Cl  -  CIO  are  passed  to 
other  subroutines  in  argument  lists,  and  the  unused  portions  of  the  arrays 
(e.g.  Cl  at  J  =  JL)  are  not  used  for  temporary  storage.  The  idea  here  is  to 
allow  the  user  the  option  of  saving  the  storage  space  by  regenerating  some 
or  all  of  the  coefficients  as  external  or  statement  FUNCTION'S  in  FORTRAN. 

It  is  only  required  of  the  user  that  he  define  the  coefficient  and  remove 
that  name  from  COMMON  GEMCOM.  It  should  be  noted  that  the  significant  stor¬ 
age  problem  of  the  ten  arrays  Cl  -  CIO  is  not  an  aspect  of  the  marching  method, 
but  simply  follows  from  the  problem  description.  The  only  significant  storage 
penalty  of  the  method  is  Cl,  giving  essentially  a  x2  penalty  compared  to 
iterative  methods. 

4.  TEST  PROBLEMS  AND  RESULTS  ON  THE  BASIC  CODE.  One  set  of  test  prob¬ 
lems  used  pseudo-random  number  generation  for  all  coefficients,  which  was 
useful  in  debugging  all  the  options.  A  second  set  used  a  simple  Poisson 
equation  modified  by  a  cross-derivative  term  VC*fXy,  formulated  with  centered 
second-order  differences. 

The  test  problems  were  run  on  a  CDC  6600.  A  sampling  of  the  results  is 
presented  in  Table  1.  Note  that  the  81  x  81  mesh  problem  for  the  5-point 
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operator  initiates  in  -  1  ms/cell ,  the  equivalent  of  64  Point  SOR  iterations, 
and  solves  repeat  solutions  in  the  equivalent  of  2  Point  SOR  iterations. 

For  the  simple  Poisson  equation  with  Ax/A y  =  10,  the  maximum  residual  error 
is  3.9  x  10'6. 

The  9-point  operator  with  non-periodic  boundary  condition  requires 
about  67%  more  initiation  time  and  about  31%  more  repeat  time.  With  periodic 
boundary  conditions,  the  9-point  operator  requires  about  3.2  x  as  long  for 
initiation  and  about  2.6  x  as  long  for  repeat  solutions. 

5.  THE  STABILIZING  CODE  GEMPAT2.  The  method  of  stabilizing  selected 
from  several  available  alternatives  (1)  is  the  multiple  patching  method.  The 
problem  size  in  J  (i.e.,  maximum  JL  for  given  cell  aspect  ratio,  etc.)  is 
doubled  by  breaking  the  solution  into  two  subregions  separated  at  J  =  JPATCH. 
With  guessed  Dirichlet  boundary  conditions  at  JPATCH,  each  subregion  is  solved 
directly  using  basic  GEM.  This  solution  gives  non-zero  residuals  along  JPATCH. 
The  new  Dirichlet  conditions  along  JPATCH  are  then  solved  directly  so  as  to 
zero  these  residuals.  The  technique  is  a  capacity  matrix  or  influence  coeffi¬ 
cient  matrix  method  which  is  not  essentially  connected  to  marching  methods. 

The  patching  matrix  is  established  in  an  initiation  procedure  which  requires 
IL-2  homogeneous  solutions  with  uni t-perturbed  Dirichlet  conditions  along 
JPATCH;  hence,  the  homogeneous  over-ride  option  NC10  in  GEM. 

This  procedure  for  a  2-patch  solution  is  incorporated  into  the  subroutine 
GEMPAT2,  which  then  calls  GEM.  Although  several  of  the  options  in  GEM  are  not 
of  interest  except  for  use  with  GEMPAT2,  it  was  decided  to  have  only  one  ver¬ 
sion  of  GEM  available.  The  possible  confusion  arising  from  the  unused  options 
seems  outweighed  by  the  advantage  of  having  only  one  version  of  GEM  to  docu¬ 
ment  and  maintain.  Similarly,  GEMPAT4,  under  development,  is  a  code  to  imple¬ 
ment  the  patching  procedure  for  a  4-patch  solution,  and  it  will  call  the  only 
version  of  GEMPAT2. 

6.  TIMING  TESTS  OF  GEMPAT2.  The  patching  method  for  a  2-patch  solution 
requires  two  of  the  Cl  matrices  (one  for  each  subregion)  and  an  additional 
storage  penalty  for  the  patching  matrix,  and  so  the  storage  penalty  is  3xILxJL, 
compared  to  ILxJL  for  the  single-region  solution  by  the  basic  GEM.  The  theo¬ 
retical  operation  count  given  in  (1)  may  be  expected  to  deteriorate  in  accuracy, 
especially  for  initiation,  as  the  number  of  patches  increases. 

The  GEMPAT2  code  is  still  being  refined,  but  the  timing  tests  on  the 
initial  version  are  very  encouraging.  For  the  5-point  operator  with  non¬ 
periodic  boundary  conditions  on  a  71  x  71  grid,  the  GEMPAT2  code  initializes 
in  the  equivalent  of  265  SOR  iterations,  a  factor  of  3.1  over  the  single 
region  solution.  This  is  significantly  better  than  the  value  of  4.3  pre¬ 
dicted  by  the  operation  count  (1).  Repeat  solutions  are  obtained  in  5.2  SOR 
iterations,  a  factor  of  2.1  over  the  single  region  solution,  in  agreement 
with  the  operation  count. 

For  the  4-patch  solution,  operation  counts  indicate  penalty  factors  of 
~10  for  initiation  and  4.3  for  repeat  solutions.  For  further  patching,  the 
growing  initiation  and  storage  penalties  make  the  method  unattractive,  and 
we  do  not  plan  a  code  above  GEMPAT4. 
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Unlike  the  single-region  solution,  in  which  the  error  is  virtually  con¬ 
fined  to  the  boundary  at  the  end  of  the  march,  the  patched  solutions  also 
have  errors  (non-zero  residuals)  along  the  patching  lines.  However,  the 
patching  matrix  is  usually  well -conditioned  and  this  error  is  acceptable  in 
the  problems  tested  to  date.  A  more  complete  investigation  of  the  errors 
and  timing  tests  of  the  codes  GEMPAT2  and  GEMPAT4  is  forthcoming. 


Table  1.  Timing  tests  of  the  GEM  Code  on  a  CDC  6600  with  Level  2 
Optimization  of  the  FORTRAN  IV  Code,  "init"  refers  to  initiation  times, 
"rep"  refers  to  repeat  solution  times,  "SOR"  refers  to  times  for  a  single 
iteration  of  a  Point  SOR  method  including  a  convergence  test  but  without 
boundary  calculations,  e  refers  to  predictions  based  on  theoretical  opera¬ 
tion  counts.  Total  times  are  in  seconds,  times/cell  are  in  milliseconds, 
based  on  the  minimum  of  three  consecutive  runs  which  included  one  initiation 
and  five  repeats. 


problem  grid 

31  x  31 

51  x  51 

81  x  81  81  x  81 

Periodic 

81  x  81 

(operator) 

(5  pt.) 

(5  pt.) 

(5  pt.)  (9  pt.) 

(9  pt.) 

init  time 

0.42 

1.74 

6.61  15.75 

30.24 

init  time/cell 

0.47 

0.70 

1.03  2.46 

4.73 

init  time/SOR 

29.7 

42.8 

64.3  107.1 

205.7 

rep  time 

0.035 

0.089 

0.206  0.384 

0.750 

rep  time/cell 

0.039 

0.036 

0.032  0.060 

0.117 

rep  time/SOR 

2.48 

2.19 

2.00  2.61 

5.10 

init  time/rep  time 

12.0 

19.6 

32.1  41.0 

40.3 

%  error,  erep 
%  error,  e.„../ev%rt„ 

-32 

-22 

-21  -28 

-32 

-22 

-19 

-15  -5 

-3 

init'  rep 
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A  NEW  VARIATIONAL  METHOD  FOR  INITIAL  VALUE  PROBLEMS, 
USING  PIECEWISE  HERMITE  POLYNOMIAL  SPLINE  FUNCTIONS 
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ABSTRACT*  A  variational  principle  for  a  functional  can  be  found  which 
satisfies  both  the  original  system  and  its  adjoint  system.  The  variations  of 
this  functional  give  no  boundary  terms  if  the  bilinear  concomitant  of  the 
systems  vanishes.  For  a  second  order  time  varying  initial  value  problem,  one 
can  adjust  the  boundary  conditions  of  the  adjoint  system  in  terms  of  the 
boundary  conditions  of  the  original  system  so  that  the  bilinear  concomitant  is 
identically  zero.  An  expression  for  the  variation  of  the  functional  is 
derived  which  contains  only  the  terms  involving  the  variations  of  the  adjoint 
variable  and  its  derivative,  but  no  variation  of  its  second  derivative.  The 
variations  of  the  adjoint  variable  and  its  derivative  are  found  to  be  zeroes 
at  the  final  conditions,  just  as  the  variations  of  the  original  variable  and 
its  derivative  are  zero  at  the  starting  (initial)  conditions.  This  implies 
that  we  are  able  to  solve  the  problem  in  one  direction  without  worrying  about 
the  conditions  at  the  other  end  as  the  initial  value  problem  should  be*  The 
algorithm  is  much  more  simplified  than  in  the  past.  An  example  is  given  to 
show  the  procedures  of  this  new  variational  method. 

I.  INTRODUCTION.  Variational  principles  apply  mostly  to  boundary 
problems  where  eigenvalues  are  sought*  It  is  seldom  used  for  initial  value 
problems  alone  where  the  far  end  conditions  are  neither  known  nor  specified. 

If  we  use  discrete  methods  to  solve  an  initial  value  problem,  such  as  finite 
difference  method,  only  the  initial  conditions  should  be  given*  In  the  same 
way,  if  we  employ  variational  method  with  spline  functions,  we  should  not 
be  concerned  with  the  far  end  conditions*  This  paper  gives  a  procedure  to 
find  a  recursive  solution  of  an  initial  value  problem  by  variational  methods 
using  the  cubic  hermite  polynomial  spline  functions* 

Let  us  consider  a  dynamical  system  governed  by  the  following  equation; 

L(t)ya(t)  =  -Q(t)  (1) 

with  appropriate  boundary  conditions*  In  the  above  equation  L  is  a  linear 
operator,  ya  is  the  dependent  variable,  Q  is  a  forcing  function,  and  t  is  the 
independent  variable* 


*Also  Professor  at  Rensselaer  Polytechnic  Institute,  Troy,  NY* 
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In  order  that  J  be  a  variational  principle  for  G  the  following  requirements 
must  be  satisfied. 

(a)  J  is  stationary  about  the  function  ys  which  satisfy  the  relation  in 

Eq.  (1). 

Ut)ys  -  -Q(t)  (6) 

(b)  The  stationary  value  of  J  deduced  from  Eqs.  (2)  through  (5)  is 

J[y,y)  =  G[ys]  G[ya]  (7) 
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Consider  first  the  stationarity  of  J  by  taking  the  variation 


tb  -  tb  -  tb  - 

6 J  -  6 { /  Qydt  +  /  yQdt  +  /  yLydt} 


rtb  —  r*-b  —  - 

6  J  —  /  Sy(Ly+Q)dt  +  J  [QSy  +  yL<$y]dt 


We  will  make  an  effort  later  to  impose  certain  conditions  in  order  that 
the  following  equality  holds: 


r tb  -  tb 

/  yL5ydt  -  /  6yLydt 


where  L(t)  is  an  adjoint  operator. 

By  combining  Eqs.  (8)  and  (9)  one  obtains 


tb  -  tb 

$J  «  J  <5y(Ly+Q)dt  +  J  6y[Ly  +  Q]dt  =  0 


Since  the  variations  6y  and  6y  are  arbitrary  it  leads  to  the  requirement  that 
the  stationary  values  ys  and  ys  must  satisfy* 

Lys  -  -Q  (11) 

Lys  =  -Q  (12) 

Since  Eq.  (11)  is  the  same  as  Eq.  (6),  therefore  J  is  stationary  about  the 
function  ys.  Equation  (12)  is  the  adjoint  equation  in  terms  of  the  adjoint 
operator  L,  the  adjoint  variable  y,  and  the  adjoint  forcing  function  Q. 

Using  the  relation  in  Eq.  (11)  for  the  stationary  value  of  J  from  Eq.  (5) 
we  have 

tb  -  tb  - 

J[ys  ys3  *  /  Qysdt  +  J  ys(Q+Lys)dt  =  Gtys3  03) 


Since  J  is  stationary  and  6J  -*■  0  then 

G[ys]  *  G[ya] 

which  is  the  requirement  given  in  Eq.  (7). 
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It  is  noted  that  Eq.  (10)  contains  no  boundary  terms  to  be  satisfied. 

This  bears  an  important  point  in  the  future  discussion  of  the  initial  value 
problems. 

III.  BILINEAR  CONCOMITANT.  The  assumed  equality  in  Eq.  (9)  is  discussed 
here  by  considering  the  following  bilinear  concomitant  [1]: 

tb  — 

yLydt  -  /  yLydt  (15) 

The  above  expression  can  also  be  written  in  terms  of  boundary  conditions 
at  t  =  tQ  and  t  =  tjj.  It  is  assumed  that  these  boundary  conditions  are 
assigned  in  such  a  way  that  the  above  bilinear  concomitant  is  identically 
zero,  i.e., 

D  =  0  (16) 

Then  the  first  variations  of  D  also  vanish. 

6 D  =  6D(6y)  +  5D(6y)  «  0  (17) 

Since  6y  and  <5y  are  independent  of  each  other,  then 

tb  -  tb  - 

<5D(6y)  =  J  6 yLydt  -  /  yLfiydt  *  0 
to  co 

and 

.tb  -  .tb  — 

6D(Sy)  «  J  yLfiydt  -  J  fiyLydt  =  0 

to  co 

Equation  (19)  is  identical  to  Eq.  (9),  which  is  the  assumed  equality 
previously.  This  implies  that  if  Eq.  (16)  is  true  then  Eq.  (9)  or  (19)  is 
automatically  true. 

IV.  INTEGRAL  OF  BILINEAR  EXPRESSION.  The  integral  of  a  function  is 
given  as 


(18) 

(19) 


I 


*Kyy)dt 


(20) 


where  »Kyy)  is  an  arbitrary  bilinear  expression  [2]  in  the  form 

i|Kyy)  “  oy'y'  +  3y'y  +  yyy'  +  eyy  (21) 
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The  prime  (')  in  the  above  expression  denotes  (d/dt). 

Equation  (20)  can  be  integrated  by  parts.  Two  different  forms  of 
integration  and  end  conditions  may  be  obtained  as  follows. 

cb  -  -  fcb 

I  =  -  /  yLydt  +  (ay'+yy)y|  (22) 

co  co 

or 

.tb  —  -  -  t-b 

I  a  “  /  yLydt  +  (ay'+3y)y|  (23) 

^  o  t0 


where  the  differential  expressions  are 

Ly  =  (ay')’  -  @y’  +  (Yy)’  -  ey  (24) 

Ly  =  (ay')'  +  (3y)'  -  yy'  ~  ey  (25) 

The  bilinear  concomitant  given  in  Eq.  (15)  can  now  be  expressed  in  terms 

of  the  function  values  and  their  derivatives  at  the  end  points  by  equating 
Eqs.  (22)  and  (23). 

-  -  tb 

d  =  [oc(y'y-y’y)  -  (y-3)yy]  I  (26) 


V.  END  CONDITIONS  FOR  THE  ADJOINT  SYSTEM.  In  order  to  satisfy  the 
expression  D  =  0  in  Eqs.  (15)  and  (16)  the  end  terms  in  Eq.  (26)  must  vanish. 
Thus  it  requires 


ab(yb'}vVyb)  ■  ao(yo'yo"yo'y0)  ~  (Yb-eb)ybyb  +  (Yo^o^oyo  -  0  (27> 

Equation  (27)  can  be  satisfied  identically  if  the  end  conditions  of  the 
adjoint  system  are  proportional  to  the  end  conditions  of  the  original  system 
as  follows: 


Yb 

=  (Y0-So)kYo 

(28a) 

yQ 

=  (Y  b“^b)kYb 

(28b) 

Yb’ 

=  -ab"1a0(yb-gb)ky0’ 

(28c) 

Yb' 

=  -ao-lab(Yo"60)kYb’ 

(28d) 

where  k  is  a  constant. 


The  above  expressions  give  the  required  end  conditions  for 
system  in  terms  of  that  of  the  original  system.  Thus  from  Eqs. 


the  adjoint 
(15)  and  (16) 


.tb  “  fcb 

D  =  /  yLydt  -  /  yLydt  =  0 


(29) 
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To  summarize,  if  one  can  make  the  end  conditions  of  the  adjoint  system 
satisfy  the  relationship  in  Eq.  (28),  the  bilinear  concomitent  D  vanishes 
The  variation  in  Eq.  (10)  is  then  valid. 


It  is  also  noted  that  the  variation  in  Eq.  (10)  has  no  far  end  terms 
which  simplify  the  computation.  This  is  because  the  far  end  terms  may  cause 
certain  difficulties  in  many  computational  schemes  on  a  number  of  variational 
methods. 


VI.  THE  FIRST  VARIATION.  Since  the  variations  6y  and  6y  are  independent 
to  each  other,  we  take  the  first  half  of  Eq.  (10)  as 


<$J(Sy) 


.t-b  -  .cb  - 

/  6yLydt  +  J  6yQdt  =  0 

co  co 


(30) 


Equation  (30)  is  not  in  a  ready  form  for  estimation.  We  prefer  to  use  61 
which  can  be  obtained  from  the  bilinear  expression  I  given  in  Eqs.  (20)  and 
(21).  Let 

61  =  6I(5y)  +  5I(6y)  (31) 


The  first  part  of  the  above  expression  can  be  derived  from  Eqs. 
as 


6I(6y) 


,tb 

/  [(ay'+Yy)6y'  +  (ey'+ey)6y]dt 

co 


(20)  and  (21) 
(32) 


Integrating  by  parts  one  obtains 


6 1(6  y) 


-  cb  tb 

(ay'+yy)6y |  -  J 

tn  t0 


6y[ (ay'+Yy) ' 


( Sy'+ey) ]dt 


(33) 


It  is  recognized  that  the  integrand  in  the  last  terra  of  the  above  formulae  is 
Ly.  Solving  for  the  last  term  we  have 

tb  _  -  tb 

/  6yLydt  *  (cty'+Yy)6y|  -  6I(6y)  (34) 

to  to 

Substituting  Eq.  (32)  into  (34)  and  then  Eq.  (34)  into  (30)  one  obtains 

6J(6y)  =  (abyb’+Yby0)Syb  ~  (aoyo'+Wo^yo 


-  / 


fcb 


[  (oty,-hyy)6y*  +  (  0yf+ey)6y]dt 


+  / 


tb 


6yQdt  =  0 


(35) 
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The  above  equation  contains  only  6y  and  6 yf  and  none  of  the  variation  of  the 
higher  derivative  such  as  6 y"  for  a  second  order  system*  The  dependent 
variable  also  contains  only  y  and  y1  and  none  of  the  higher  derivative  such  as 
y"  for  a  second  order  system* 

VII,  ADJOINT  VARIABLE  FAR  END  VALUE  FOR  INITIAL  VALUE  PROBLEMS.  For  a 
second  order  system  the  initial  values  of  the  function  and  its  first 
derivative  are  given,  i.e.,  yG  and  y0'  are  known  in  Eq.  (28)*  The  far  end 
values  for  the  adjoint  system  and  y^f  are  found  from  Eqs*  (28a)  and  (28c). 
Since  the  variation  of  a  constant  is  zero,  then 

6y0  21  6y0’  =  0 

and  -  - 

6yb  =  6yb'  =  0 

The  conclusion  =  0  in  Eq.  (37)  is  important  in  that  the^first  term  at  the 
right  side  of  Eq.  (35)  vanishes.  Thus  the  coefficient  of  6y^  is  not  necessar¬ 
ily  zero*  This  implies  that  the  function  y^  and  its  derivative  y^1  at  the  far 
end  are  not  related  as  such.  By  not  using  any  local  boundary  conditions  at 
the  far  end,  the  computation  can  start  at  the  near  end  and  carry  on  in  one 
direction. 

Thus  Eq.  (35)  is  simplified  to 

tb  - 

<$j(sy)  =  -(y0yo+aoyo' )6yo  +  I  6yQdt 

to 

tb  - 

-/  [(ey+3y'  )<5y  +  (yy+ay'  )6y'  ]dt  (38) 

co 

It  Is  noted  that  the  above  equation  does  not  have  boundary  terms  to  be 
satisfied  at  the  far  end  at  time  t^.  This  is  consistent  with  the  notion  of 
"initial  value  problem"  physically. 

VIII.  TRANSFORMATION  OF  COORDINATES.  The  integral  sign  in  Eq.  (38)  can 
be  converted  into  a  summation  sign  if  discrete  intervals  for  integration  are 
used.  Since  the  analysis  is  an  initial  value  problem,  without  losing  any 
generality  we  may  let 


(36) 

(37) 


t0  ■»  0  and  tb  =  1  ,  (39) 

that  is  the  independent  variable  is  within  the  interval 

0  <  t  <  1  (40) 

Equation  (38)  can  be  discretized  by  letting 

£  =  Kt  -  m+1  (41) 

0  <  S  <  1,  0  <  t  <  1,  m  =  1,2, ...K  (42) 

where  K  is  the  number  of  intervals. 
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Thus 


dC  =  Kdt 

dt  =  K-1d£ 

(43) 

The  differential  relationship  is 

dy 

dy 

(44) 

K  -- 

dt 

dK 

or 

y«  , 

=  Ky 

(45) 

where 

o  = 

d 

—  (  ) 

(46) 

dt 

Then  Eq.  (38)  becomes 
5  J(6y)  =  0 


.  _  K  i  _ 

■  -('r0yo+aoKyo)<syo  +  1  I.  Sy^QK-^ 

m=l  0 

K  i  .  .1 

-  I  f  [(ey(m>+8Ky(™>)6y(m>  +  {(Yy(m)+aKy(m>}K6y(m>]K-ld£  (47) 

n-1  0 


IX.  PIECEWISE  SPLINE  FUNCTIONS.  We  may  express  the  variables  y^m^  and 
y(m)(£)  terms  of  piecewise  spline  function  a^(£)  and  the  node  point 
functions  Y^™)  and  Y^®)  as  follows. 


y(m)(U  =  aTU)  Y<®)  6y(m>  =  [6Y<m> ]Ta(5)  (48) 

y(n»)(0  =  T(m)  6y0“>  -  [6Y<m>]Ta(0  (49) 

y(m>(0  =  aT(5)Y(ra)  6y(”»)  =  [«Y<®>]Ta(5>  (50) 

•  • 

y(m)(g)  =  *T(5)y(m)  5y(m)  =  [6Y0“>]TaU)  (51) 

m  = 

yQ  “  aT(l)Y(°)  (52) 

*  A  * 

yQ  -  aT(l)Y(°>  (53) 

6y0  =  6Y(°>a(l)  (54) 
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If  Eqs •  (48)  through  (54)  are  substituted  into  Eq.  (47)  one  obtains 

0  =  -[5Y(°)]Ta(l) [y0aT(l)  +  ci0KaT(l)  ]y(°) 

K  _  l 

+  l  K-1  /  a(C)Qd5 

m=l 

K  i 

-  I  [6Y(m)]T  /  a(5)[eK-1aT(5)  +  8ax(0]dS  Y<“) 

m=l  ® 

K  _  i  , 

-  I  [6Y(®)]T  /  a(0[YaT(0  +  aKaT(5)]d?  Y<®>  (55) 

m=l  0 

This  simplifies  to 

0  =  -[SY(°)]Ta(l)[Y0aT(l)  +  <x0KaT(l) ]y(°> 

K  K 

+  l  [5Y(m)]xq(m)  -  l  [SY^m)  ]T  pU)  (56) 

m-1  m3l 

where 

q(m)  .  k'1  J*  a(C)Q(S)dC 

=  [qi(m),  q2(m)»  q3<m).  q4(m)]T  (57) 

and 

p(m)  «  f1  (a(5)(e(II‘)K''1aT(?)  +  g(m>aT(£;)]  +  a(?)[y(m)aT(?)  +  a(m)KaT(5)]  }d? 
0 

=  e(m)K-1B  +  &(m)c  +  y(m)D  +  a(®)KE  (58) 

or 

[Pij(m)]  =  c(m)K-nbij]  +  P<m>[c1;j]  +  Y(m)[dtj]  +  a(™)K[eij]  (59) 

where 


B  =  [bij] 

=  J1  a(5)aT(5)d? 

0 

(60) 

C  “  tci j ] 

»  J1  a«)aTU)<tt 

0 

(61) 

D  =  [di:j] 

-  J*  a(OaT(5)d5 

(62) 

E  -  [eij] 

I  *  * 

=  Jq  a«)aT«)d5 

(63) 
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X,  CUBIC  HERMITE  POLYNOMIAL  SPLINE.  The  cubic  Hermite  polynomial  spline 
is  continuous  in  the  functional  values  and  its  first  derivatives  across  the 
nodes.  Since  we  have  no  second  derivatives  for  a(£)  in  Eqs.  (58)  to  (63),  no 
higher  order  spline  is  necessary  for  this  problem. 

The  cubic  Hermite  polynomial  gives 


whose  derivatives  are 


1  aj(5)  = 

l-3?2+2eH 

| 

1  a2(0 

5-2£2+C3  1 

a(C)  =  1 

(64) 

1  a3(0 

352-253  I 

| 

1  a4(5) 

l_ 

-5  2+Z 3  1 

J 

n. 

1  ai(6)  - 

-65+6C2  1 

I 

.  1  a2(5) 

a(5)  -  1  . 

1 

1-4S+352  | 

1 

(65) 

I  33(0 

65-6S2  I 

| 

I  a4(£) 

l_ 

-25+3C2  I 

above  equations  that 

the  node  point  values  are 

(0)  =  [ai(0)  a2(0)  a 

3(0)  a4(0)]T 

=  [  1  0 

0  0  ]T 

(66a) 

a(0)  -  [ai(0)  a2(0)  a3(0)  a4(0)]T 

=  [  0  1  0  0  ]T 

a(l)  =  [ai(l)  a2(l)  a3(l)  a4(l)]T 

=  [  0  0  1  0  ]T 


(66b) 


(66c) 


a(l)  =  [ai(l)  a2(l)  83(1)  a4(l)]T 

-  [  0  0  0  1  ]T 


(66d) 
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We  wish  to  form  a  vector  whose  components  are  taken  from  the  function  and  its 
derivative  at  the  left  node  and  then  the  same  at  the  right  node.  From  Eqs. 
(48),  (49),  and  (66)  we  have 


r  1 

r  ,  i  r 

I 

1  y(ffl)(0)  | 

I  1 

-  1  aT(0)Y<»)  1  »  1  1  0  0 

|  1  | 

0  1 

l 

1  y(ffl>(0)  I 

=  |  aT(0)YO°>  |  -  |  0  1  0 

1 

0  1 

1  1 

1  1  1 

I  Y(ro)  =  Y^™) 

1  y(m)(l)  1 
|  1 

-  |  aT(l)Y(m)  .1-10  0  1 

1  li 

0  1 

1 

1  y(m>(l)  1 

=  1  aT(l)Y(m)  |  =  |  0  0  0 

1 

1  | 

(67) 

L  J 

L  J  L 

J 

If  we  define 

Y(™)  =  [Y]/m)  Y2(ra)  Y3^ra^  Y4^m^ 

]T 

(68) 

Then 

Y^m)  =  y(m)(0) 

(69a) 

Y2(m)  =  y(m) (0) 

(69b) 

Y3(m)  »  y(®)(l) 

(69c) 

Y4U)  =  y(m)(i) 

(69d) 

The  above  implies  that  the  same  node  point  has  been  represented  by  two 
notations  as  follows 

yU+1)(0)  =  yU)(l)  (70a) 

y(m+l)(o)  -  y(m)(i)  (70b) 

By  expanding  Eq.  (68)  for  different  m  one  obtains 

Y(0)  -(00  Y3(°)  Y4(°)]t  -[00  y(°>(l)  y(°>(l)]T  (71a) 

y(1)  =  (YjO)  Y2U)  Y3O)  Y40)]T  =  [ y ( 1 ) ^ q j  y(!)^Qj  yO)(i)]T  (71b) 

y(m)  -  (Y^ffl)  Y2(m)  Y3(m)  Y4(m)]T  * 

(y(m)(0)  y(™)(0)  y^mhi)  y^mhi)]T  (71c) 

y(ih+1)  =  [Y^(ra+D  Y2(ni+1)  Y3(ra+0  Y4(ra+1)]T  = 

[y(m+! ) (o)  y<m+n(0)  y(m+n(1)  y(m+l)(1)]T  (7ld) 
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Thus  we  have 

YjCra+l)  =  Y3^ra^  (72a) 

and 

Y2(m+D  =>  y4(m)  (72b) 

for  m  =  0,1,2,.. ,K. 


Similar  to  the  above  equation  one  can  prove  from  Eqs.  (50)  and  (51)  that  the 
adjoint  variations  are 


<SY1(tn+1)  =  6Y3Cm)  (73a) 

and 

6Y2(m+l)  =  6Y4C™)  (73b) 


is 


XI.  METHOD  OF  SOLUTIONS*  First  we  take  the  last  term  of  Eq.  (56)  which 
K 

R3  =  “  I  [SY^™)  <5Y2<m)  6  Y3^m^  SY4(m)  ]  [  p±  ,(ra)  ]  [Yq(m)  Y2^m^  Y3^m)  Y4C«)]T 
m=l 


(74) 


Using  the  relationship  from  Eqs.  (72)  and  (73)  gives 


K 

r3  -  -  l  {  [p11(m)Y3(m't'1)  +  p12(®>Y4(m“1)  +  pi3(m)Y3(m)  +  pi4(m)Y4(m)  ]  5Y3(m~1) 
m=l 

+  tp21^m^3^ra_1^  +  p22^m^Y4^m“1^  +  p23(m)Y3(m)  +  p24^m^Y4^ra^  ]  6 Y^™-1 ) 

+  t p3j  (m^Y3(m"0  +  p32(ra)Y4(ra’:i)  +  p33(m)Y3(m)  +  P34(i“)Y4(m)  ]  <$Y3(ra) 

+  tP4i(m)Y3(m“1)  +  P42(m>Y4(ra"1)  +  P43(m)Y3(m)  +  P44(m) Y4(m> ] <5Y4(ra) }  (75) 
r3  “  "  [pll^l^Y3(°)  +  pi2^l^4(°)  +  Pi3^ O y3( ^ )  +  px4(Oy4(1)]6y3(°^ 

~  [P21^1^Y3^)  +  P22^Y4^  +  P23*>Oy3(0  +  p24O)y4(1)]6Y4(0) 

K-l 

~  l  (  [pii^m+1)Y3(m)  +  p32^m+1)Y4(in)  +  p13(ra+l)Y3(m+l)  +  Pl4(m+l)Y4(m+l)  ] 

m-1 


+  +  p32(m)Y4(m“1)  +  +  p34(m)Y4(m)  ]  )6Y3^m) 

K-l 

~  I  (  [p21^^3^m)  +  P22^m+^Y4^ra^  +  p23(m+0Y3(m+l)  +  p24(m+l)y^(m+l)  ] 
m=l 


+  [ P4i C™) Y3  Cm-1 >  +  P42(ni)Y4(ra~1 )  +  p43(m)Y3(m)  +  p44(m)Y4(m) ] }6Y4(m) 

-  [P31(k)y3(k-d  +  P32(k)y4(k-i)  +  P33(k)y3(k)  +  P34(k)y4(k) j<5y3(k) 

-  [P41(r)y3(r-D  +  p42Y4(K-l)  +  P43(K)y3(K)  +  P44(K)y4(K)]6y4(K)  (76) 
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It  is  noted  here  that  the  variations  at  the  far  end  are 

SYg(^)  =  5yb  =  0  (77) 

SY4(k>  =  5yb'  =  0  (78) 

by  virtue  of  Eqs.  (36)  and  (37).  Thus  the  last  two  terms  of  Eq.  (76)  drop 
out. 

It  is  again  important  to  emphasize  here  that  the  computation  does  not 
contain  the  condition  placed  at  the  far  end  boundary.  The  calculation  starts 
with  the  initial  conditions  and  carries  through  in  one  direction. 

The  second  term  on  the  right  side  of  Eq.  (56)  gives 

K  _ 

R2  =  l  [qi(m)  q2(m)  q3(m)  q4(ra)  ]  [  SY^™'1)  SY4(m_1>  5Y3(m>  6Y4(m>]T 
m-1 

=  q1(1)5Y3(°)  +  q2^1^Y4(°) 

K— 1  -  K-l 

+  l  [q1(m+1)  +  q3(m)l6Y3^m)  +  l  [q2(m+1)  +  q4(m)  ] SY^1") 
m3l  m-1 

+  q3(^)6Y3(K)  +  q4(K)SY4CK>  (79) 

The  last  two  terras  drop  out  again  by  virtue  of  Eqs.  (77)  and  (78)  . 

The  quantity  q(m)  is  again  expressed  as 

qA(m>  =  KT1  J*  aAU)Q(ln>(0<U  l  =  1,2, 3, 4  (80) 

The  first  terra  on  the  right  of  Eq.  (56)  is 

Rx  =  -[0  0  6Y3(°)  6Y4(°)][0  0  1  0] T{yo[0  010]+  aoK{0  0  01]}[0  0  Y3(°>Y4(0>]t 

*  -6Y3(°>{y0Y3(°)  +  u0KY4(°)}  (81) 

Combining  all  the  above  results  and  substituting  into  Eq.  (56)  we  have 

0  =  Rx  +  R2  +  R3  (82) 
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0  =  {-{yoY3(0)  +  a0KY4(°)} 

+  qj_0)  -  lP11^^3^  +  P12^' y4^  +  P13^^Y30)  +  P14^  Oy^U)  ] }  6Y3^' 
+  {q2^^  -  [p21^^Y3^)  +  P22^^4^^  +  P23^3^^  4-  P24^  ^Y^^  U  ] }  SY^^ 


K-l 

+  1  {  [qx Cni-t-l )  +  q^Cm)] 

m=l 

-  [pil(m+l)Y3(m)  +  pi2(m+l)Y^(ni)  +  pi3(m+l  ^(ni+l )  +  pi^(m+l)  Y4^m+1 ) ] 

-  [p3l(ra)Y3<m~l>  +  P32^m^Y4^m-^  +  P33^m> Y3^ra^  +  P34(m)Y4(ra> ] }<$Y3(m> 

K-l 

+  l  {q2Cm+1)  +  q4^m) ] 

m=l 

-  [p2l(ra+l)Y3(m)  +  p22(m+l)Y^(ni)  +  p23(m+l )Y3(m+l )  +  p2^(m+l)Y^(m+l) ] 

-  [p41^m>Y3f™-1)  +  p42(m>Y4(m"1)  +  p43(ro>Y3(n’)  +  P44(m>Y4(ra>]}6Y4(m>  (83) 

XXX.  RECURSIVE  SOLUTIONS.  Since  the  variations  6Y3(°),  6Y4<0) ,  6Y3(m) , 
and  6Y4^m"5"  in  Eq,  (83)  are  all  arbitrary,  the  coefficients  of  all  these 
variations  must  vanish.  We  first  take  the  coefficients  of  the  variations 
5 Y3(°)  and  6Y4(°). 

I  Pi  3^  1  ^  P14(1)I  |Y3(l)l  I(~P11^-Yo)(-P12^~0‘ok)  I  lY3(0)l  +  1  qi  ^ 1  ^  I 

|p230)  P24^ 1 )  I  I Y4CD  |  I  C“P2 1  ^ 1  ^  (~P22^  I  I Y4  CO)  |  I  q2^ 1  ^  I 

L  _l  L  J  L  _i  L  -I  L  _I 

(84) 

It  Is  noted  that  Y3(0)  and  Y4^®^  are  the  initial  conditions  of  the  problem 
that  is  from  Eq.  (67)  and  (46). 

Y3C0)  =  y0  (85) 


Y4(0)  =  yQ  =  K’V 


(86) 
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We  can  solve  for  Y3(l)  and  Y4(l)  in  terms  of  these  initial  conditions  by 
inverting  the  two  by  two  matrix  in  Eq.  (84). 


i  Y3O) 

|y4(1) 


P13(1)  P14^1J 
P23(1)  P24(1) 


j(-pn(l)-Y0)  (-p12-a0K] 

l(-P21(1))  (“P22^1h 


r'y'o 


For  a  general  case  where  m  e  1 ,2, . . *K-1 ,  we  have  by  setting  the 
coefficients  of  5Y3(m)  and  SY^111)  in  Eq.  (83)  to  zero. 


|  Yj(nvH  )  1 

|Y4(m+l) 1 


pi^(m+l)  p^4(m+l) 

P23^m"*"^  P24^m+^ 


|(p11(m+1)  +  P33(m))  (P12^m+1)  +  p34(m))|  |y3(m) 

I(p2i^ra+^  +  P43^m^)  (p22^m+^  +  P24 )  I  1^4 


P3^(m)  p32^m^  |  |Y3^m^^ 

P41^m^  P42^m^  I  I  ) 


(m+1)  +  q3(m) 
,(111+1)  +  Q/C™) 


We  solve  the  above  equation  recursively  for  Y3^m+^)  and  Y^111"*"^ .  Starting 
with  m  =  1  we  have  the  initial  conditions  Y3(^)  and  Y4^^  and  the  solutions 
from  Eq.  (87)  for  Y3^^  and  These  values  are  substituted  into  Eq.  (88) 

to  obtain  Y3^)  and  Ya^)^  From  the  values  of  Y3^^f  Y^^),  Y3^),  and  ¥4^' 
one  can  determine  Y3^^  and  Y4^\  This  procedure  continues  until  we  obtain 
Y3^)  and  Y4^^  which  are  the  final  values  of  the  problem* 

XIII*  NUMERICAL  RESULTS  AND  DISCUSSION.  The  analysis  presented  in 
previous  sections  will  now  be  tested  by  way  of  some  numerical  examples.  Let 
us  consider  a  simple  oscillator  subjected  to  a  harmonic  force.  The 
differential  equation  can  be  written  as 


my  +  ky  =*  fQ  cos  u)ft  0  <  t  <  T 


(89a) 


where  T  is  some  finite  time  of  interest  and  a  dot  (*)  denotes  differentiation 
with  respect  to  time.  The  initial  conditions  are 


y(0)  -  yQ  and  y(0)  =  y0 


(89b) 
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The  system  of  Eqs.  (89a)  and  (89b)  is  normalized  with  respect  to  T  and  it 
becomes 


y*”  +  k*y*  =  f*  cos  mf*t*  0  <  t*  <  1 


and 


y*(0)  =  yQ*  and  y*(0)  =  yg* 
Through  the  following  change  of  parameters 


t  dt 

t*  =  -  ,  dt*  =  — 

T  T 


dy 

y* ( t* )  =  y(t)  ,  y*'(t*)  =  T  — 

dt 


k*  «*  kT2  /m  ,  f*  =  fQT2/m  ,  u)f*  =  wfT 

yo*  3  yo  *  yi*  ■  Tyi 


Comparing  Eq.  (90a)  with  Eqs.  (24)  and  (1),  one  has 

a  =  constant  =*  1  ,  e  =>  -1 

8=0  ,  y  =  0  and  Q  =  -f*  cos  wj*t* 

From  the  data  presented  here,  we  further  set 


(90a) 


(90b) 


(91) 


(92) 


m  “  1.0  ,  k  =  1.0  ,  fQ  =  1.0  ,  bif  =  0.5 

The  parameter  T  is  given  for  each  set  of  sample  calculations. 

First,  Eq.  (84)  can  be  used  exclusively  to  obtain  all  the  solutions. 

This  is  demonstrated  in  Tables  1  through  HI.  In  these  tables  T  has  taken  to 
be  ten,  five,  and  two,  respectively  and  the  number  of  steps  for  all  cases  is 
taken  to  be  ten.  Both  y(t)  and  y(t)  are  shown  and  the  exact  solutions  are 
given  in  parentheses  for  comparison.  It  is  clear  that  the  results  are 
convergent,  i.e.,  they  are  improved  as  the  interval  of  time  is  decreased. 
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TABLE  X.  SOLUTION  TO  A  FORCED  VIBRATION  PROBLEI1  OF  A  SIMPLE  OSCILLATOR 
(0  <  t  <  10,  Ten  Steps.  Exact  Solution  Shown  in  Parenthesis) 


r 

i 

i 

t  1 

y(t) 

1 

1 

1 

y(t) 

r 

i 

i 

r 

i 

- r 

o  1 

1.0000 

~r~ 

1 

(Given) 

1 

1 

1.000 

1 

1 

T 

(Given)  1 

i 

2.0  | 

1.7590 

1 

(  1.7684) 

i 

-0.711 

1 

(-0.674)1 

i 

4.0  | 

-1.1495 

1 

(-1.0938) 

1 

-1.450 

1 

(-1.512)1 

i 

6.0  | 

-1.8534 

1 

(-1.9195) 

1 

0.867 

1 

(  0.773)1 

i 

8.0  | 

0.2261 

1 

(  0.1663) 

! 

0.564 

I 

(  0.689)1 

i 

i 

10.0  | 

-0.0531 

1 

1 

(  0.1139) 

1 

1 

-0.404 

1 

1 

(-0.381)1 

1 

TABLE  II.  SOLUTIONS  TO  A  FORCED  VIBRATION  PROBLEM  OF  A  SIMPLE  OSCILLATOR 
(0  <  t  <  5,  Ten  Steps.  Exact  Solutions  Shown  in  Parenthesis) 


r 

1 

1 

t  | 

y(t) 

1 

1 

1 

y(t) 

\ 

1 

1 

T 

1 

0  | 

1.0000 

1 

1 

(Given) 

1 

1 

1.0000 

1 

1 

T 

(Given)  1 

1 

1.0  i 

1.8314 

1 

(  1,8315) 

1 

0.4991 

1 

(  .5012)1 

1 

2.0  | 

1.7646 

1 

(  1.7684) 

1 

-0.6828 

1 

(-0.6740)1 

1 

3.0  | 

0.5536 

1 

(  0.5654) 

1 

-1.6161 

1 

(-1.6079)1 

1 

4.0  | 

-1.1074 

1 

(-1.0938) 

1 

-1.5060 

1 

(-1.5121) | 

1 

1 

5.0  | 

-2.1221 

1 

1 

(-2.1217) 

1 

1 

-0.4129 

1 

1 

(-0.4350)1 

I 

TABLE  III.  SOLUTIONS  TO  A  FORCED  VIBRATION  PROBLEM  OF  A  SIMPLE  OSCILLATOR 
(0  <  t  <  2.0,  Ten  Steps.  Exact  Solutions  Shown  in  Parenthesis) 


T 

! 

1 

t 

“1 

1 

1 

y(t) 

y(t) 

I 

I 

1 

T 

1 

0.0 

~i 

i 

1.0000 

! 

1 

(Given)  1 

1.0000 

1 

I 

r 

(Given)  | 

1 

0.4 

i 

1.3892 

1 

(1.3892)  | 

0.9184 

1 

(  .9184)1 

1 

0.8 

i 

1.7132 

1 

(1.7132)  | 

0.6760 

1 

(-0.6760) 1 

1 

1.2 

i 

1.9116 

1 

(1.9117)  | 

0.2961 

1 

(  0.2966)1 

I 

1.6 

i 

1.9379 

1 

(1.9382)  | 

-0.1752 

1 

(-0.1742) | 

1 

1 

2.0 

i 

i 

1.7676 

1 

1 

(1.7684)  I 

-0.6754 

1 

I 

(-0.6740)1 

1 
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Some  discussion  on  the  present  formulation  compared  with  previous  work 
[3,4]  is  in  order  here.  In  previous  work  on  unconstrained,  adjoint 
variational  formulation,  the  point  of  emphasis  was  to  free  the  requirements  of 
satisfying  any  of  the  initial  conditions  and  to  let  the  approximate  solution 
converge  to  them.  In  the  present  analysis  it  is  shown  that  the  far  end 
conditions  need  not  be  considered  in  a  variational  formulation  of  approximate 
solutions.  A  more  detailed  comparison  in  terms  of  numerical  convergence, 
competency,  efficiency,  etc.  is  planned. 
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ABSTRACT.  Some  recently  developed  fully  discrete  methods  for  the  numer¬ 
ical  solution  of  linear,  second-order  systems  of  ordinary  differential  equa¬ 
tions,  arising  e.g.  from  finite  difference  or  finite  element  semidiscretizations 
of  hyperbolic  equations,  are  reviewed.  These  methods  are  of  high  order  of 
accuracy,  have  desirable  stability  properties  and  are  computationally  efficient. 
Extensions  to  problems  with  first-order  time  derivative  terms  (arising  e.g. 
from  the  equations  of  structural  dynamics)  and  nonlinear  problems  are  also 
considered. 


1.  INTRODUCTION.  This  paper  traces  several  investigations  which  the 
authors  have  pursued  in  the  study  of  second-order  systems  of  differential 
equations.  We  first  consider  a  linear  problem  in  an  abstract  setting  and 
recall  the  two-step  "cosine"  schemes  studied  first  in  [3]  for  a  homogeneous 
problem  and  extended  in  [6]  to  the  nonhomogeneous  problem.  Next,  we  introduce 
a  first-order  time-derivative  into  the  problem;  in  [7]  we  establish  that 
several  methods  in  the  literature  can  be  explained  as  particular  examples  of 

a  general  class  of  implicit  Runge-Kutta  methods  when  specified  to  these  prob¬ 
lems.  Finally,  we  make  some  preliminary  remarks  on  a  class  of  "Rosenbrock"- 
like  methods  which  we  are  currently  proposing  for  some  nonlinear  second-order 
problems  [12]. 

2.  COSINE  METHODS.  We  consider  a  second-order  evolution  equation  in  an 
abstract  setting.  Let  H  be  a  complex  Hilbert  space  endowed  with  an  inner 
product  (■,  •)  and  corresponding  norm  |[*[j  .  Let  A  be  a  linear  operator 
with  domain  j](A)  ,  dense  in  H  .  A  is  assumed  to  be  positive  definite  and 

self-adjoint  on  $(A)  ;  it  may  be  unbounded.  For  given  in  H  ,  the 

problem  to  be  solved  is 


utt  +Au=0’  0  <  t  _<  t* 

u(0)  =  u°  ,  ut(0)  =  ut° 


(1) 


N 

For  example,  with  H  =  I  ,  A  a  hermitian,  positive  definite  N  v  N  matrix, 
(1)  is  just  a  system  of  ordinary  differential  equations.  Such  systems  can 
also  be  realized  as  semidiscretizations  (finite  difference  or  Galerkin)  of 
certain  second-order  hyperbolic  partial  differential  equations. 

It  is  well-known  that  the  solution  to  (1),  under  the  assumptions 
u°  6  $(A)  ,  4  €  2/{ A^^)  ,  can  be  obtained  uniquely  as 
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Further,  the  solution  (2)  satisfies  the  recursion  (for  h  >  0  constant) 

u ( t  +  2h)  -  2  cos(h  A1/2)  u(t  +  h)  +  u(t)  =  0  (3) 

The  approximation  scheme  is  based  on  (3)  .  We  suppose  that  R(x)  is  a  rational 
approximation  to  cos  t  ,  t  £  0  which  satisfies 

|  R  (  t  )  -  cos  t|  £  C  tV+2  »  0  <_  t  £  o 
I  R(t)  I  £  ^  >  0  5  T  _£  n 

for  certain  constants  C,  a,  r\,  v  .  (4)  represents  an  accuracy  requirement, 

while  (5)  is  for  stability  of  the  scheme;  when  r>  =  +  °°  ,  we  shall  obtain 

unconditionally  stable  approximation  schemes.  Then,  we  determine  w  €  #(A) 
to  approximate  u(nh)  by  the  three-term  recurrence 

a)n+2  -  2  R(h  A1/2)wn+1  +  ojn  =  0  (6) 

We  prove  in  [3]  the  error  estimate 

max|Jojn  -  u(nh)[|  =  $(hv)  (7) 

n 


In  order  for  the  approximation  scheme  (6)  to  be  effective,  we  propose 
the  approximations  R(x)  determined  from  the  generating  relation 


(1  +  x' 


z2)£ 


cos  z  = 


z 

n=0 


*<S)(X) 


2n 


(s) 

n 


(x) 


x2(n-j) 


(9) 


By  setting  z  =  t 
mation 


Rg(x;  t) 


and  truncating  at 


z  +<s)(x)  t2" 

n=Q  n _ 

/,  ,  2  2,  s 

(1  +  x  t  ) 


n 


s  ,  we  obtain  the  rational  approxi- 


(10) 


which  satisfies  (4)  with  v  =  2s  and 
x^  >  0  such  that  (5)  holds  with  n 
advantages  of  (10)  are  two-fold:  i) 
so  A  '  is  never  computed,  and  ii) 


for  which  there  exists  a  parameter 

=  +  ®  for  x  >  x(s)  .  The  computational 
—  .  2 

R(t)  is,  in  fact,  a  function  of  i  , 

the  denominator  of  R(h  A  '  )  is  of 


P  o  C 

the  form  (I  +  x  h  A)  ,  hence  one  matrix  decomposition  and  s  back 
solves  are  required  for  each  time  step. 

The  cosine  approximation  (10)  is  related  to  a  class  of  rational  approx¬ 
imations  to  the  exponential  introduced  by  Saker  and  Bramble  [2].  This  con¬ 
nection  enables  us  in  [8]  to  employ  the  "real  pole  sandwich"  results  of 
N0rsett  and  Wanner  [10]  and  the  "order  stars"  of  Wanner,  Hairer,  and  Ntfrsett 
[14]  to  further  study  stability  and  accuracy  dependence  on  the  parameters  x 
and  s  . 

The  extension  of  the  cosine  schemes  to  nonhomogeneous  equations 

utt  +  Au  =  g(t)  ,  Ol ) 

is  considered  in  [6].  The  recurrence  relation  becomes 

n+2  9  /0/2\  n+1  n  .. 

u  -  2  cos(h  A  )  u  +  u  = 

yn  £  h  f  A'1/2  sin(h  A1/2  s)[g((n  +  2  -  s)h)  +  g((n  +  s)  h]ds 
J0 


Approximating  (12)  by 


wn+2  -  2  R(h  A1/2)o3n+1  +  wn  =  6n 


we  show  that  if  we  select  6n  so  that  ||yn  -  6n||  =  0(hv+2) 
estimate  [|wn  -  un|[  =  0(hv)  maintains. 


(13) 

then  the  error 


In  order  for  the  scheme  (13)  to  be  viable,  the  choice  of  6n  as  a 

quadrature  for  yn  is  nonstandard.  For  example,  a  fourth-order  scheme 

7 

(v  =  4)  uses  (with  g  =  x  a  parameter) 

R(h  A1/2)  =  (I  +  eh2  A)"2[I  +  (2B  -  1/2)  h2  A  +  (g2  -  3  +  ^)h4A2] 

6n  =  (I  +  gh2  A)'2[~{(gn+2  +  10  gn+1  +  gn)  +  h2(24e  -  1  )A  gn+1>]  . 


The  result  of  (14)  is  an  algorithm  implemented  as  follows.  Define 
B  =  I  +  eh2  A  .  Then  we  determine  pn  via  Ap*1  =  gn2  +  10g  +g  ,  define 

*n  =  j^[pn  +  h2(24B  -  1)  gn+1]  ,  solve  B  q(1)  -  -h2[I  +  (28  -  ^)h2A]  w"+1  + 

solve  B  q^2^  •  A  ,  and  set  wn+2  =  2  un+1  -  i/1  +  q^  . 

Finally,  we  observe  that  if  A  -  M  with  M  and  K  sparse  (e.g.  the 
Galerkin  semidiscretization  of  the  wave  equation),  each  of  the  equations  to 
be  solved  in  the  above  algorithm  can  be  multiplied  through  by  M  *  and  hence 
all  matrix  operations  are  sparse. 
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3.  DIAGONALLY  IMPLICIT  RUNGE-KUTTA  FOR  DAMPED  PROBLEMS.  We  next  seek 
to  extend  our  considerations  to  second-order  systems  with  first-order  time 
derivatives  also  present.  While  a  class  of  two-step  schemes  extending  (6) 
to  certain  of  these  evaluations  has  been  studied  by  one  of  us  in  [13],  we 
find  that  a  more  efficient  treatment  is  afforded  by  the  application  of 
certain  implicit  Runge-Kutta  methods  studied  by  Crouziex  [5]  and  Alexander 
[1 ]  on  an  equivalent  first-order  system.  We  specifically  study  the  linear 
structural  dynamics  equation 

M  ytt  +  C  yt  +  K  y  =  f(t)  05) 

with  N,  C,  K  sparse,  positive  definite  N  *  N  matrices.  Brusa  and  Nigro 
[4]  propose  a  (globally)  third-order,  computationally  efficient  single-step 
scheme  for  (15);  our  goal  in  [7]  is  to  identify  their  method  as  equivalent 
to  a  specific  choice  of  implicit  Runge-Kutta  method,  which  then  allows  us 
to  generalize  their  scheme. 

A  first-order  system  equivalent  to  (15)  is 


which,  with  obvious  notation,  is 

07) 

or  yt  -  4  9  +  '3  .  A  =  -if1  1  ,  5  =  f~]  7  08) 

To  (18),  we  employ  Crouzeix's  form  of  implicit  Runge-Kutta  scheme,  defined 
for  an  equation  of  the  form  ~  ?{t,  )  by  a  tableau  of  coefficients 
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In  particular,  Alexander  discusses  the  DIRK  (diagonal ly-impl icit  Runge- 
Kutta)  schemes,  wherein  a^j  =  0  ,  i  <  j  and  all  a^  =  $  ■  It  is  easy 

to  see  that  for  (16)  (hence  (15)),  a  DIRK  scheme  is  accomplished  by  the 
following  algorithm: 


For  i  =  1 , . . . ,  q  , 
i-1 


(i)  Let  r,  -=  -  h  ^  au  Vj 


i=l 

Si  H  Vn  +  h  £  aij  Vn , j 
fn,i  *f^n.i) 


(22) 


( i i )  Sol ve 


(M  +  6h  C  +  d2  h2  K)  v  ,  =  -K(r •  +  eh  s.) 

n  5  i  i  * 


C  si  +  fn,i 


(iii)  Set  «n>1  =  Sh  vn>,  +  s, 


Then,  form  v,  ..  -  -n>1 


u_  +  h  z  b,  u_ 
i=l 


and  v„,i  =  v„  +  h  z  b .  v  • 
n+1  n  j_1  i  n,i 


The  study  of  the  algorithm  (22)  is  facilitated  when  one  notes  the  cor¬ 
respondence  with  certain  rational  exponential  approximations  discussed  by 
N0rsett  [9]  .  Of  particular  note,  though,  is  the  information  conveyed  in  the 
t.  ,  where  loads  are  to  be  evaluated,  which  was  not  found  in  earlier  work 

specifically  for  the  linear  problem.  Alexander  [1]  tabulates  the  coefficients 
for  several  specific  A-stable  and  strongly  S-stable  DIRK  schemes,  to  which  we 
apply  (22)  for  a  model  problem  in  [7]  . 


4.  A  SCHEME  FOR  A  NONLINEAR  PROBLEM.  Our  current  interest  encompasses 
certain  nonlinear  second-order  problems  of  the  form  u^  =  F(t,  u,  u^)  ,  but 

we  shall  limit  our  consideration  here  to  the  simpler  problem  -  G(u)  . 

Of  course,  we  could  apply  the  implicit  Runge-Kutta  methods  indicated  above 
(or  some  similar  methods  suggested  first  by  Rosenbrock  [11]);  again,  we  shall 
defer  these  considerations  to  a  future  paper.  Our  goal  here  is  to  produce  a 

Rosenbrock-type  method  which  is  4“  order  accurate,  two-stage,  and  reduces  to 
a  scheme  (mentioned  above)  introduced  by  Baker  and  Bramble  when  applied  with  _ 
G(u)  =  -Au  .  In  particular,  each  in  this  class  of  schemes  is  stable  on  a  strip 
containing  the  imaginary  axis  for  certain  choice  of  parameter. 
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Again,  we  convert  to  a  first-order  system  of  the  form 
Yt  *  F(Y)  ,  Y  =  [u,  v]T  ,  v  =  ut. . 

The  idea  of  Rosenbrock  was  to  introduce  the  Jacobian  Fy  directly  into  the 

scheme,  rather  than  introducing  it  later  in  a  Newton-like  effort  to  solve  non 
linear  algebraic  equations.  For  the  Baker-Bramble  analogue,  though,  we  intro 
duce  the  Aquane.  of  the  Jacobian  Fy  .  Specifically,  we  propose  a  two-stage 

computation  of  the  form 

[I  -  fh2  Fy(Yn)]K-j  =  [l  +  ah  Fy(Yn)]F(Yn) 

[I  -  Bh2  F2(Yn)]K2  =  [I  +  eh  Fy(Yn)]F(Yn  +  yh  K] ) 

(23) 

+  6h  Fy(Y^  +  nbK^ ) F( +  vh  K-j )  -  u 

Vl  =  Yn  +  h<bl  K1  +  b2  K2> 

Applied  to  a  linear  problem,  i.e.  F(Y)  =  BY  ,  it  can  be  seen  easily 
that  (23)  reduces  to  the  difference  equation  Yn+-j  =  r(hB)Yn  ,  where 

. 1  * 2  +  4  -  ^ ~ 6  *  b2)z4  (24) 

0-6  z2)2 

is  indeed  the  fourth-order  Baker-Bramble  exponential  approximation. 

When  converted  over  to  the  notation  of  the  second-order  problem,  letting 
K.  =  [P.  Qi]T  ,  we  find  that  (23)  requires  that  we  solve  at  each  time  step 
four  systems  with  the  same  N  x  N  system  matrix 

[I  -  gh2  Gu(un)JPi  =  vn  +  «h  G(un) 

[I  -  6h2  Gu(un)]Q,  -  ah  Gu(un)vn 

[I  -  Bh2  G  (u  )]P9  =  v  +  yh  Q,  (25) 

L  u  n  2  n  1  1 

+  (e  +  y)  h  G(un  +  yh  P-] )  ~  y  p-| 

[I  -  Bh2  Gu(un)]Q2  *  6  h  Gu(un)  •  [vn  +  yh  Q-j]  +  G(un  +  y  h  P] ) 

+  6h  Gu(un  +  n  h  P-j )  *  [vn  +  vh  Q-j]  -  p  Q-| 
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The  parameters  of  the  scheme  are  determined  so  that  the  local  truncation 
error  (determined  here  for  the  scalar  problem  Y^  =  F(Y))  is  0(h  )  .  This 
requires  that 

b,  K,  +  b2  K?  -  [F  ♦  £  FyF  +  £fvy  F2  ♦  F2  FI  (26) 

3 

+  I4{FYYY  p3  +  4  FYY  F Y  p2  +  PY  F}]  +  0(h4) 

By  requiring  that  the  parameter  g  remain  free  for  stability  assignment, 

(26)  eventuates  a  set  of  nine  nonlinear  algebraic  equations  in  the  nine  remaining 
parameters  (c.f.  [12]  for  details).  We  have  ascertained  the  existence  of  many 
possible  solutions  of  this  system;  it  remains  for  us  to  detemiine  which  of 
these  solutions  provides  the  best  (e.g.  smallest  error  constant)  of  the 
schemes  (25)  and  to  report  the  results  of  ongoing  numerical  experiments. 
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ABSTRACT.  It  is  well  known  that  numerical  algorithms  for  the  approximate 
solution  of  first  order  hyperbolic  partial  differential  equations  which  are 
stable  for  the  Cauchy  problem  and  for  scalar  initial-boundary  value  problems  are 
often  unstable  when  used  for  ini tial -boundary  value  problems  for  systems.  These 
instabilities  arise  from  the  particular  boundary  conditions  used  to  close  the 
discrete  system.  Two  methods  of  generating  stable  boundary  treatments  are 
presented.  The  first  is  applicable  to  finite  difference  and  Galerkin  finite 
element  schemes  and  is  based  on  the  theory  of  characteristics.  The  second  scheme 
is  based  on  "energy"  estimates  of  the  solution  in  norms  equivalent  to  |_2  »  but 

which  lead  to  different  discretizations.  The  latter  scheme  is  used  in  conjunction 
with  Galerkin  finite  element  methods. 

1.  INTRODUCTION.  It  is  often  observed  that  instabilities  in  solving 
hyperbolic  equations  are  caused  by  the  incorrect  treatment  of  the  boundary  con¬ 
ditions.  Algorithms  which  are  stable  for  the  Cauchy  problem  can  be  unstable 
when  used  in  conjunction  with  particular  boundary  treatments.  What  clouds  the 
issue  further  is  the  observation  that  boundary  treatments  which  are  stable  for 
scalar  hyperbolic  equations  may  be  unstable  when  applied  to  systems. 

Consider  the  system 

/  u  \  / 1/2  l\/u\  0<x<l 

(,)■(,  „X) 

t  t  X 

with  the  initial  conditions  u(x,  0)  ,  v(x,  0)  and  the  boundary  conditions 
u(0,  t)  ,  u(l ,  t)  given.  It  can  be  shown  that  this  problem  is  well  posed. 

In  solving  the  equations  numerically  one  generally  needs  special  equations  to 
find  v  at  both  boundaries,  even  though  analytically  it  is  determined.  A 
natural  procedure  is  to  do  something  special  for  the  variable  v  by  itself 
since  u  is  determined  at  both  boundaries  by  the  given  boundary  conditions. 

It  is  shown  in  [1]  that  if  one  uses  the  Lax-Wendroff  finite  difference  method 
to  solve  (1)  in  the  interior  and  uses  quadratic  extrapolation  to  determine  v 
at  both  boundaries,  the  resulting  scheme  is  unstable.  This  is  in  spite  of 
the  fact  that  in  [2]  it  is  shown  that  this  scheme  is  stable  for  scalar  hyper¬ 
bolic  equations.  Similarly,  in  [3]  it  is  shown  that  the  obvious  use  of  the 
Galerkin  finite  element  method  to  solve  (1)  is  unstable,  although  the  scalar 
case  is  well  behaved. 
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In  this  paper  we  present  two  boundary  treatments  which  are  stable  for 
systems  of  hyperbolic  equations.  The  first  scheme  is  discussed  in  Sections 
2  and  3  and  is  based  on  the  use  of  characteristics  to  carry  correct  information 
to  and  from  boundaries.  The  second  scheme,  discussed  in  Section  4,  is  based 
on  measuring  the  solution  of  the  differential  equation  in  a  norm  which, 
although  equivalent  to  the  L-p-norm,  leads  to  a  different  discretization. 

2.  THE  CHARACTERISTIC  SCHEME  FDR  FINITE  DIFFERENCES.  For  finite  dif¬ 
ferences  we  will  use  the  theory  developed  in  [2]!  Wi thout  loss  of  generality 
we  assume  that  the  system  is  in  characteristic  form,  i.e. 

u.  =  A  u  0  <  x  <  1  (2) 


where  A  is  a  diagonal  q  *  q  matrix  with  elements  ?  0  .  This  system  can 
be  partitioned  into  the  systems 


I  _  „I  I 
ut  -  A  ux 


A1  <  0  ,  A1; 


p  +  r  =  q 


II  AII  II 
Ut  =  A  Ux 


aii  n 

A  >0 


r  x  r 


together  with  the  initial  conditions 
u(x,  0)  =  f(x) 
and  boundary  conditions 

u^O,  t)  =  SQ  un(Q,  t)  +  gQ(t) 


un(l,  t)  =  S1  u^l,  t)  +  g1  (t) 


In  the  interior  we  use  a  scheme 
Q_,  gj+1  =  I  pk  uj'1* 

1  J  k=0  K  J 


At  the  boundary  x  =  0  we  have  two  types  of  conditions, 
given  boundary  condition  (6),  i.e. 


The  first  uses  the 


/  I  m  c/  1 1  \  n  ,  n 

(u0)  =  S0(u0  )  +  90  • 


The  second  kind  of  boundary  treatment  is  a  numerical  scheme  for  the  remaining 
unknowns  Ug  .  By  construction  this  operation  involves  only  these  unknowns. 


222 


(10) 


,  IKn+1 

(uQ  ) 


s 

E 

k=0 


Tk(u'Vk 


We  note  that  in  general  the  operators  and  appearing  in  (8)  and  (10)  are 
all  block  diagonal.  Only  Sq  in  (6)  need  not  be  diagonal.  In  addition  to  the 


boundary  treatment  (9)  -  (10),  there  is  an  analogous  set  at  x  =  1  . 


We  assume  that  the  scheme  (8)  is  stable  for  the  Cauchy  problem  and  that 
the  scheme  (8)  -  (10)  is  stable  for  the  scalar  semi- infinite  problem,  i.e.  we 

I  II 

have  0  <  x  <  °°  ,  u  a  scalar  with  u  =  u  or  u  =  u  .  Using  these  assum¬ 
ptions  we  can  prove  the  following.  (The  proofs  are  supplied  in  [4].) 

Proposition  1  -  The  scheme  (8)  -  (10)  for  the  semi-infinite  vector  problem 
is  stable. 


Theorem  2  -  The  scheme  (8)  -  (10),  along  with  the  corresponding  boundary 
treatment  x  =  1  is  stable  for  the  vector  initial -boundary  value 
problem  on  0  <  x  <  1  . 

In  practice  one  may  deal  directly  with  the  non-diagonal  form  of  the  equations 
and,  at  least  for  explicit  schemes,  we  may  implement  the  above  boundary  treatment 
as  a  post-correction  to  an  existing  and  perhaps  unstable  algorithm.  We  illustrate 
these  points  by  the  use  of  an  example. 


Consider  the  system  (1)  with  initial  conditions 


u(x,  0)  =  Uq (x )  , 

and  boundary  conditions 

V  (x  ,  0)  =  Vq  (  X  )  , 

(ID 

u(0,  t)  =  gQ(t)  , 

u(l ,  t)  =  g1 (t)  . 

(12) 

We  then  find  that 


(u  +  v)t  ■=  f(u  +  v)x 
(u  -  v)t  -  -i(u  -  v)x 


(13) 


so  that  at  x  =  0  ,  (u  +  v)  is  the  characteristic  variable  coming  into  the 
boundary  while  (u  -  v)  is  the  characteristic  variable  moving  away  from  the 
boundary.  We  denote  by  Uq,  Vq  the  values  of  u  and  v  at  the  boundary 

x  =  0  calculated  by  some  scheme.  We  then  set 


u(0,  t)  =  gQ(t) 

u(0,  t)  +  v(0,  t)  =  UQ  +  VQ 


(14) 
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or,  solving. 


u(0,  t)  =  gQ(t) 


05) 


v(0,  t)  =  VQ  +  [UQ  -  g0(t)]  . 

Simi larly 


u(l,  t)  =  g-,  ( t ) 

v(l ,  t)  =  V]  +  [g1 (t)  -  U]  J  . 


(16) 


Thus,  the  bracketed  terms  in  (15)  and  (16)  can  be  considered  as  correction 
terms  to  a  given  algorithm  and  as  such,  one  may  keep  all  coding  in  existing 
programs  and  one  only  need  calculate  the  correction  terms.  For  explicit 
schemes,  the  method  suggested  by  (14)  is  exactly  equivalent  to  doing  the 
boundary  treatment  on  the  characteristic  variables. 


3.  THE  CHARACTERISTIC  SCHEME  FOR  FINITE  ELEMENTS.  In  this  section  we 
describe  the  characteristic  scheme  in  conjunction  with  Galerkin  methods. 
Without  loss  of  generality,  we  may  consider  the  system 

ut  =  A  ux  +  F  for  0  <  x  <  1  (17) 

where  a  is  a  diagonal  matrix.  Again,  we  can  partition  A,  F,  and  u 
so  that 


uj  =  A*  u*  +  F*  and 

We  impose  the  initial  condition 
u*(x,  0)  =  Uq(x)  and 
and  the  boundary  conditions 


+  F 


II 


11/  r\  \  11/  \ 

u  (x,  0)  =  uQ  (x) 


(18) 


(19) 


u1  =  Sq  u11  at  x=0 


u11  =  S1  u1  at  x  =  1 


(20) 


(21) 


Problems  with  inhomogeneous  boundary  conditions  may  easily  be  converted  into 
a  problem  of  the  type  (18)  -  (21). 

The  L?-Galerkin  formulation  of  the  problem  (18)  -  (21)  is  given  by: 

I  c  T  T 

find  u  and  u  such  that 
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(22) 


(v1,  u{  -  A1  u’  -  F1)  -  0 

(v11,  u”  -  A1'  „»  -  F»)  -  0 

for  all  v1  and  in  suitable  vector  spaces.  Here 

fl  T 

(v,  u)  =  v  u  dx  .  (23) 

J0 

At  any  time  t  ,  an  approximation  to  the  solution  of  (22)  is  found  by 
seeking  (u1,  u11)  e  ^  x  U2  such  that  (22)  holds  for  all  (v1,  v11)  e 
*  V 2  where  ,  U^,  and  Vg  are  finite  dimensional  spaces.  Here  we 
assume  that  -  IL  except  perhaps  for  boundary  effects.  Furthermore,  we 

shall  assume  that,  e.g.  V,  =  H  *  H  x  H  x  ...  x  H  where  H  is  a  space  of 
scalar  functions  and  where  the  number  of  products  is  determined  by  the  dimen¬ 
sion  of  u1  . 

Let  {ip i/'j)  be  a  basis  for  H  .  We  may  construct  another  basis 
for  H  by  setting  (assuming  that  ( 0)  +  0  and  ^(1)  f  0) 

,  %  .  Ml)  „ 

^0(X)  =  *0(X)  -  ^yy  *j(l) 

*j(0)  „ 

0j(x)  =  *j(x)  -  i0(0) 

and 

,  ,  MO)  i,(l) 

Vx)  =  (,j(x)  -  ¥*>  -  ^tt  <,j(x)  for  1 1  j  i ■ 

Then  the  new  basis  has  the  properties  that 

'J'j(O)  =  0  for  j  >  0  and  ^(1)  =  0  for  j  <  J  .  (24) 

If  p  and  r  are  the  dimensions  of  u^  and  u^  ,  respectively,  then  we 
choose  the  bases 


{ek 

V 

Cl 

* 

* 

II 

and 

<-j. 

II 

O 

C-. 

{ek 

k  =  1 , . . . ,  r 

and 

j  =  0,...,  J 
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D  p 

for  the  space  U-j  and  Ug  •  respectively,  where  e£  and  are  the  k-th 

unit  vectors  of  dimension  p  and  r  ,  respectively.  Then  since  u^  €  U 

and  u11  C  ,  i.e.  each  component  of  u*  and  u^  is  in  H  ,  we  may 
wri  te 


u^x,  t)  =  Y.  cj(t)  (x) 
j=0  J  J 


and 


(25) 


uH(x,  t )  =  1  C^U)  *.(x)  . 

j=0  J  J 

III  .  , 

where  C.  and  C.  are  vectors  of  dimension  p  and  r  ,  respectively. 

J  J 

The  values  of  C?(0)  and  C^(0)  for  j  =  0,...,  J  are  determined  by 
J  J 

solving  an  interpolation  problem  using  the  initial  data. 


The  bases  for  V-j 

and 

are 

simila 

rly  chosen. 

Substitution 

of  (25) 

into 

(22)  then  yields 

d 

- 

J! 

dt 

■<  t*r 

dtp. 

c 

dx  1  c 

k«,^  _ 

pk£j 

(26) 

for  £  =  I 

,  II 

and  i  = 

0 , . . . , 

J  and  k 

=  1,. 

. . ,  p  if 

£  =  1 

and  k  =  1 

J  *  •  •  J 

r  if  i 

=  II  . 

Here 

k  £ 

c-  represents 

the  k- 

•th  component  of 

the 

vector 

CJ 

and  si  mil 

larly 

for 

X  and  F  .  If  we  let 

dip . 

—A) 

dx  ‘ 

(M)ij  -  <*<• 

♦j) , 

■ 

:  (tpj » 

for 

i,  J 

=  0,. .., 

J 

and 

*  <V 

Fkt) 

for 

i  =  0 

» •  •  • » 

J  , 

then  (26)  may  be  expressed  as 


d  c* 

M  TT- =  xk  «  4  +  4 


The  matrix  M  is  the  Gram  matrix  for  the  basis  {ipQ  tpj)  under  the 

inner  product  (23)  and  therefore  M  is  symmetric  and  positive  definite. 
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As  a  consequence  of  (24)  we  have  that 


d\K  diK 

(^’  d^}  =  _{ 


dx 


whenever  either  i  or  j  is  different  from  0  or  J  .  Therefore  the 
matrix  Q  is  skew-symmetric  except  for  the  (0,  0)  and  (0,  J)  elements. 
These  elements  are  given  by 

(Q)00=  -  ^0(0)32  <  0  and  (Q)  j  j  =  ^(l)]2  >  0  . 

The  boundary  conditions  (20)  and  (21)  may  be  applied  by  constraining  the 
k£ 

coefficients  c.  appropriately.  We  stress  that 

J 

(v,  ut  -  A  ux)  =  0  and  (v,  ut  -  A  ux)  *  0 

for  non-diagonal  A  are  not  equivalent  at  the  boundaries.  In  the  latter  case, 
a  characteristic  variable  is  left  unconstrained  at  the  boundary,  while  in  the 
former  case  some  linear  combination  of  characteristic  variables  is  left  un- 
constrainted  at  the  boundary.  As  indicated  by  the  results  of  [3],  using  non¬ 
diagonal  boundary  treatments  can  yield  instabilities.  However,  the  diagonal 
boundary  treatment  yields  stable  schemes.  Of  course,  in  practice,  we  may 
implement  the  stable  boundary  treatment  on  the  non-diagonal  system  in  much 
the  same  manner  as  that  employed  in  Section  2  for  finite  difference  methods. 

i 

We  may  prove  the  following  concerning  the  above  Galerkin  method.  (Again, 
see  [4]  for  details. ) 

Proposition  3  -  The  above  Galerkin  method  for  a  scalar  initial-boundary 
value  problem  is  stable. 

Proposition  4  -  The  above  Galerkin  method  is  stable  for  vector  semi¬ 
infinite  problems,  i.e.  problems  posed  on  0  <  x  <  »  . 

Theorem  5  -  The  above  Galerkin  method  is  stable  for  the  vector  initial¬ 
boundary  value  problem,  i.e.  posed  on  0  <  x  <  1  . 

Thus,  as  in  the  finite  difference  case,  boundary  treatments  based  on  the 
use  of  characteristic  variables  yield  stable  approximations. 


3.  THE  CHANGE  OF  NORM  SCHEME.  We  consider  the  general  hyperbolic  initial  - 
boundary  value  problem  (A  may ”bi~i assumed  diagonal) 

u+  =  a(x  ,t)u  +  B(x,t)u  +  F(x,t)  for  0  <  x  <  1  ,  t  >  0  (23) 

X  X 

along  with  the  initial  conditions 
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(24) 


u(x,  0)  =  u q ( x )  for  0  <_  x  _<  1 
and  boundary  conditions 

u*(0,  t)  =  SQ  uH(0,  t)  ,  uH(l,  t)  =  S1  uT(l ,  t)  for  t  >  0  ,  (25) 

where  u,  A,  B  and  F  are  partitioned  in  the  usual  manner.  We  immediately 
may  prove  the  following  (see  [5]  for  details) 

2 

Proposition  6  -  There  is  an  inner  product  (*,*)'  on  L  with 

2 

associated  norm  11*11'  equivalent  to  the  standard  L  -norm 
||*|j  with  respect  to  which  the  operator 


is  semibounded,  i.e.  there  is  an  a  <  ®  such  that 
(Lu,  u) '  <  a||u||2  . 

The  inner  product  (*,  •)'  of  Proposition  6  is  defined  by 

/■l  T 

(v,  u) 1  =  v  G(x)  u  dx  (26) 

Jo 

where  G  is  a  positive  definite  symmetric  matrix  (diagonal  when  A  is 
diagonal).  The  matrix  G  is  chosen  so  that 

s]  An(l)  G 1 1  ( 1 )  S1  <  A1  (1 )  GT(1)  (27) 

and 

sj  A*(0)  GT(0)  SQ<  An(0)  GH(0)  (28) 

where  G  has  been  partitioned  corresponding  to  A  .  For  non-diagonal  systems, 
the  matrix  G  may  be  chosen  by  first  diagonalizing  the  system,  choosing  the 
corresponding  G  ,  and  then  transforming  back  to  non-diagonal  form. 

Using  this  G  matrix,  one  can  easily  show  that  the  exact  solution  of 
our  differential  problem  satisfies  the  estimate 

||u(t)||  <  C[eot||u(0)l|  +  f1  ea(t's)  ||F(s)||ds]  (29) 

Jo 

where  C  and  a  are  constants  (see  [5]). 
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The  modified  Galerkin  method  we  propose  is  at  each  time  t  to  seek  a 
U  £  S*1  such  that 


(~  ,  V)'  =  (a|£  ,  V)'  +  (BU,  V)'  +  (F,  V)'  (30) 

for  all  V  €  .  Here  is  a  subspace  of  .  Note  that  if  we  choose 

9 

G  =  I  so  that  (•,  •)'=(*»•).  the  ordinary  L  inner  product,  the 

method  (30)  is  in  general  unstable  (see  [3]). 


If  the  inner  product  (*,  •)'  Is  chosen  as  in  (26)  with  G  satisfying 
(27)  -  (28),  then  analogously  to  (29)  we  may  show  (see  [5])  that 


||U{t)|l'  5  e“t||U<°)|l 


+ 


'*  e“(t’s) 

Jo 


l|F(s)||'  ds 


* 


i.e.  the  semi-discrete  Galerkin  method  (30)  is  stable.  Furthermore,  if  the 
ordinary  differential  equations  (30)  are  approximately  solved  by  the  Crank- 
Nicolson  method,  we  can  prove  that 

uunir  i  ea|<t||u0ir +  c  ^  max  r(s)ii' 

0<s<t 

where  ak  =  a(l  -  ~)_1  ,  k  is  the  time  step  and  Un  is  the  approximation 
to  U(kn)  . 

We  summarize  these  results  in  the  following  theorem  (see  [5]  for  details). 

Theorem  7  -  The  Galerkin  scheme  based  on  (30),  with  the  inner  product 
(•,  •)'  and  matrix  G  chosen  according  to  (26)  -  (28)  is  stable. 
Furthermore,  the  fully  discrete  scheme  based  on  (30)  in  conjunction 
with  a  Crank-Nicol son  method  for  approximating  the  time  derivative 
is  also  stable. 


In  [5]  are  also  to  be  found  results  concerning  the  rate  of  convergence  of  both 
the  semi-di screte  and  fully  discrete  schemes  based  on  (30)  .  Some  remarks 
about  extensions  of  this  approach  to  problems  in  two  space  dimensions  are  also 
given  in  [5]. 

We  conclude  by  giving  two  examples  of  the  construction  of  the  matrices 
G  appearing  in  the  inner  product  (26).  First  consider  the  system  (1)  with 
initial  conditions  (11)  and  boundary  conditions 

u(0,  t)  =  u(l,  t)  =  0  .  (31) 


Here,  the  system  matrix  is  non-diagonal  so  that  the  matrix  G  will  be 
non-diagonal .  We  note  that 
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where 


X 

2 


1 


Thus  the  diagonal  form  of  our  problem  is 


I  II.  _  II  I  .  , 

u  =  -u  at  x  =  0  »  u  =  -u  at  x  =  1 

plus  initial  conditions.  Thus,  in  the  notation  of  (26),  (27)  we  have 
.1  _  3  .II  _  1  c  -  i  c  =  „i 

A  2  *  A  *"■  2  *  ' 

so  that  the  scalars  G*  and  G**  satisfy,  by  (26),  (27) 

Gn(l)  <  3  G!(l)  and  3GJ(0)  <_  GIX(0)'  - 

These  inequalities  are  satisfied  by 

GH(x)  =  3  and  G!(x)  =  1  . 

Then 


We  note  that  with  the  boundary  conditions  (31), 

fir  ■  »)'  ■  w  It  •  -  o 


so  that 
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||u(t)|J 1  =  (I u (0) || '  for  t  >  0  , 


(32) 


i.e.  the  "G-energy"  is  conserved. 

In  our  second  example  a  constant  G  matrix  will  not  work  to  simultaneously 
satisfy  (26),  (27).  Consider  the  problem 


3u_.au 

at  3X 


where  u 


* 


u*(0,  t)  =  5  u^(0,  t)  and  u^(l,  t)  =  u*(l,  t) 
plus  initial  conditions.  In  the  notation  of  (26)  -  (27),  we  have 
A1  =  1  ,  A11  =  10  ,  SQ  =  5  and  S]  =  1 


so  that  if 


(26)  -  (27)  require  that 

10  GU(1)  <  G!(l)  and  25  GJ(0)  <  10  G1 1  (0)  (33) 

which  cannot  be  simultaneously  satisfied  by  constant  matrix  G  .  However, 
if  we  choose 

G^(x)  =  8x  +  2  and  G^(x)  =  -4x  +  5 
then  (33)  is  satisfied.  Thus,  we  may  choose 


It  can  be  shown  that 

<Af£,  u)’  ■  (Aff  ,  Gu)  <40[|u||2  . 
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EXTRAPOLATING  METEOROLOGICAL  DATA  FOR 
ARTILLERY  APPLICATIONS 

Abel  J.  Blanco 

US  Army  Atmospheric  Sciences  Laboratory 
White  Sands  Missile  Range,  NM  88002 

ABSTRACT.  Preliminary  results  derived  from  a  mathematical  algorithm  for 
calculating  impact  dispersion  due  to  meteorological  factors  are  presented. 
The  report  presents  a  comparison  of  three  techniques  for  extending  the  maximum 
ordinate  of  the  Artillery  Computer  Meteorological  Message  from  20  to  23  km, 
for  application  to  projectiles  traversing  higher  altitudes.  The  three 
techniques,  called  the  default,  the  extrapolation,  and  the  modified 
extrapolation  (or  climatological),  are  analyzed  against  data  from  69 
rocketsonde  flights  that  were  conducted  over  White  Sands  Missile  Range,  New 
Mexico,  during  1979.  The  measured  and  estimated  data  are  used  to 
ballistically  simulate  552  impact  displacements  for  a  trajectory  of  a  proposed 
rocket  system.  The  findings  show  that  the  extrapolated  meteorological 

correction  yields  a  significant  improvement  over  the  current  default  method  of 
using  a  standard  meteorological  message.  Impact  dispersion  error  analyses 
illustrate  that  a  software  addition  to  the  current  meteorological  message 
procedure  predicts  all  impacts  within  the  current  one  probable  error  when  the 
meteorological  message  is  extended  3  km  in  altitude. 

1.  INTRODUCTION.  With  advanced  technology  in  artillery  ballistics, 
projectile  delivery  at  ranges  greater  than  50  km  can  be  expected.  Under 
certain  conditions  these  projectiles  will  traverse  altitudes  higher  than  20  km 
above  ground  level  (AGL).  The  expected  meteorological  effects  on  the  target 
displacement  error  need  to  be  investigated  for  projectile  traversals  beyond  20 
km  AGL  because  the  current  computer  meteorological  message  reports  information 
only  to  20  km  AGL.  This  paper  presents  preliminary  results  from  a  comparison 
of  three  techniques  that  extend  the  maximum  ordinate  of  the  artillery 
meteorological  message  for  application  to  projectiles  traversing  higher  than 
the  20  km  meteorological  message  limit.  The  comparison  really  reduces  to  the 
question  of  how  well  the  actual  meteorological  profile  can  be  estimated  from 
available  information  at  the  lower  altitudes. 

The  techniques  investigated  include  the  current  default  method  of  using  a 
standard  meteorological  message,  the  method  of  extrapolating  available  data 
from  lower  levels,  and  the  method  of  using  climatological  values.  The  paper 
illustrates  the  effect  of  the  default  method  in  assuming  zero  wind  and  using 
temperature  and  density  and  pressure  profiles  representative  for  global 
applications.  The  method  of  extrapolating  wind,  temperature,  and  density  and 
pressure  provided  the  smallest  expected  (meteorological)  impact  displacement 
for  the  sample  considered.  For  extrapolations  extended  up  to  3  km  beyond  the 
20  km  current  maximum  altitude,  the  extrapolated  values  proved  to  be  good 
estimates  of  the  actual  ballistic  parameter  values  effecting  the  projectile 
impact.  The  climatological  method  which  required  adjusted  corrections  from 
available  information  at  the  lower  altitudes  also  showed  a  significant 
improvement  over  the  default  method.  The  meteorological  impact  errors  are 
smaller  than  those  allowed  from  the  default  method  but  larger  than  those 
allowed  from  the  extrapolated  method.  Climatological  input  is  also 
required.  The  method  is  included  in  this  study  because  it  may  prove 
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advantageous  when  extrapolated  values  are  needed  at  ranges  which  cause  the 
ballistic  trajectory  to  exceed  the  extended  3  km  height. 

The  development  of  the  extended  meteorological  message  techniques  and 
ballistic  simulation  programs  was  tested  by  using  a  single  rocket 
configuration.  The  selected  trajectory  reaches  65  km  range  and  traverses  23 
km  AGL  in  altitude.  Data  needed  to  describe  this  trajectory  (for  example, 
ballistic  wind  and  temperature  and  density  coefficients  including  weighting 
factors  and  unit  effects)  were  obtained  from  the  Project  Manager  of  the 
Multiple  Launched  Rocket  Systems  (MLRS).*  To  attain  this  altitude,  the 
projectile  had  to  be  launched  at  3048  m  above  sea  le'vel;  consequently,  the 
meteorological  extending  techniques  could  be  evaluated  at  the  23  to  26  km 
level  of  the  lower  stratosphere.  As  is  the  case  for  the  artillery  techniques 
for  aiming  a  gun  (ref  1)  on  a  target,  this  paper  uses  the  launcher  surface  as 
the  zero  level. 

2.  EXTENDING  METEOROLOGICAL  APPLICATION.  Available  techniques  for 
extendi ng  the  meteorol ogical  data  for  projectiles  reaching  higher  than  20  km 
AGL  vary  from  hardware  and  software  or  a  combination  of  these.  In  this  paper 
only  software  techniques  will  be  discussed.  The  rocketsonde  data  are  assumed 
to  represent  the  actual  atmospheric  parameters;  then  the  extending  technique 
comparison  reduces  to  how  well  the  actual  meteorological  profile  can  be 
estimated  from  available  information  below  the  20  km  AGL  limit.  Also,  the 
implication  is  that  if  these  measured  meteorological  data  are  used  for  aiming 
an  artillery  piece,  then  the  displacement  due  to  meteorology  on  the  target  is 
zero.  When  the  true  meteorological  data  are  known,  the  simulated  fire 
provides  a  hit  every  time. 

The  first  technique  examined — one  which  the  Artillery  currently  uses — will 
be  called  the  default  method.  Whenever  a  meteorological  message  or 
climatological  tables  are  unavailable,  the  artillery  pieces  are  aimed  by  using 
a  meteorological  message  which  contains  standard  temperature  and  pressure  and 
density  data.  The  standard  wind  is  a  constant  zero  speed  for  all  (line 

numbers)  layers.  In  cases  where  the  meteorological  messages  are  unavailable 
or  are  not  complete  to  the  20  km  AGL  limit,  the  current  procedure  defaults  to 
the  standard  meteorological  conditions  for  the  missing  data. 

The  second  technique  is  extrapolation.  The  missing  data  are  defined  from 
the  last  available  layer  and  are  used  to  estimate  the  remainder  of  the 
meteorological  message  for  application  up  to  the  maximum  ordinate  of  the 

artillery  projectiles.  A  persistent  wind  is  used  which  is  the  wind  direction 
and  windspeed  at  the  20  km  layer  held  constant  up  to  the  apogee  of  the 

trajectory.  The  extended  values  for  temperature  are  computed  by  adding  the 
standard  gradient  of  the  temperature  default  method  to  the  last  known 
temperature  value.  Finally,  for  the  last  parameters,  the  hydrostatic 
extrapolation  of  the  density  and  pressure  is  computed  by  using  the 

extrapolated  temperature  values  and  available  density  and  pressure  value.  The 
detailed  extrapolation,  assuming  the  hydrostatic  equation  and  the  perfect  gas 
law,  yields  the  following  expressions: 


*Personal  communication  between  Mr.  Henry  Oldham,  Missile  Command,  and  Dr. 
Donald  M.  Swingle,  Atmospheric  Sciences  Laboratory,  January-February  1980 
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Gravity  g0  =  9.80665  m  s 

Air  molecular  weight  M  =  28.966  g  mol 

Gas  constant 
Geopotential  layer 

Extended  temperature  T(I)=T#+T(I)-T 

s  s  0 

where  T#  =  20  km  value 
Ts  =  standard  temperature 

T  =  standard  temperature  at  20  km 
S  0 

T  «.  »C 

Lapse  rate  L(I)  =  [T( I )  -  T0]/a  H ( I ) 

L  +  ( »K)  km-1 

[To  +  273.16]  (1  +  llT1 

Extended  density  p(I)  =  Po  ~[j( I )  +' "273TTFJ 

where  Po  =  20  km  value 
p  +  g/m3 


R  =  8314.32  J  ( °K)-1  mol"1 

aH(I)  =(9'g376)(I)(1000)m 
where  1  =  1,  2 ,  3 


The  extrapolated  values  for  the  layers  of  1  km  thickness  are  extended  by 
iterating  the  above  relationships  with  respect  to  I  until  the  maximum  altitude 
desired  is  reached. 

The  third  technique  is  defined  by  a  modification  to  the  extrapolated 
technique.  This  method  uses  climatological  data  to  estimate  values  of  the 
unavailable  data.  The  difference  between  the  data  at  the  20  km  AGL  layer  and 
the  data  of  the  climatological  values  for  the  time  of  year  and  location  of 
actual  meteorological  application  is  used  to  adjust  the  climatological 
estimate.  Even  though  the  Field  Artillery  does  not  have  climatological  tables 
available  for  these  extended  heights,  this  technique  was  included  to  develop 
the  concept  of  translating  the  meteorological  trend  from  climatological  or 
fallout  meteorological  messages  to  continue  the  extended  meteorological 
message  from  the  20  km  AGL  values. 

The  US  Army  Field  Artillery  needs  an  estimate  of  the  meteorological  impact 
displacement  for  proposed  high  trajectory  weapons.  Therefore,  the  emphasis  of 
this  paper  is  to  estimate  the  ballistic  meteorological  effects  and  not  the 
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actual  value  of  the  missing  meteorological  data  at  the  extended  altitudes. 
The  three  methods  for  extending  meteorological  data  above  the  altitude 
actually  measured  are  then  transformed  into  a  departure  from  a  selected 
meteorological  standard,  and  the  error  in  failing  to  estimate  the  ballistic 
atmospheric  effect  will  be  illustrated  by  a  displacement  about  the  target.  In 
summary,  figure  1  illustrates  the  percent  departures,  plotted  as  (.),  from  the 
United  States  Standard  Atmosphere  (USSA)  1962  (ref  2)  for  16  rocketsonde  data 
flights  collected  during  January  1979.  This  is  the  standard  atmosphere  the 
Ballistic  Research  Laboratory  uses  for  trajectory  computations  (ref  3).  The 
departures  for  the  month's  climatological  data  are  also  plotted  (x). 
Extending  technique  1  uses  the  default  value  of  the  USSA  (no  departure). 
Technique  2  uses  the  wind  components  measured  at  the  20  km  layer  and  also  uses 
this  value  at  the  21,  22,  and  23  km  layers.  The  extended  values  for  the 
departure  temperatures  and  density  and  pressure  values  are  the  normalized 
deviations  between  the  extended  values  and  corresponding  values  of  the  USSA. 
Technique  3  adds  the  climatological  data  with  respect  to  the  corresponding 
heights  and  the  difference  between  the  last  available  data  and  the  climatology 
at  20  km  to  compute  the  data  at  the  missing  layers.  By  superimposing  the 
climatological  departure  value  (x)  on  a  particular  value  of  the  20  km  level, 
one  computes  the  difference  that  will  be  arithmetically  added  to  the  remaining 
climatological  profile  levels. 

3.  BALLISTIC  SIMULATION.  A  comparison  of  the  impact  dispersions 
(realized  5y  three  techniques)  was  reviewed  to  evaluate  the  extending 
techniques  and  to  gain  some  insight  on  the  effect  of  the  extended 
meteorological  message.  This  report  assumes  that  the  actual  meteorology  is 
defined  as  the  measured  parameters  deduced  from  the  rocketsonde  data  (ref 
4).  These  data  were  then  represented  in  the  Artillery  computer  meteorological 
message  format  (ref  5)  with  new  layers  of  1  km  thickness  added  to  complete  an 
extended  message  to  23  km  AGL.  The  investigated  techniques  used  measured  data 
below  20  km  AGL  and  extrapolated  or  climatological  data  for  each  layer  up  to 
the  maximum  ordinate  of  23  km  AGL.  Using  the  same  data,  each  extending  method 
yields  a  dispersion  about  an  assumed  target.  The  meteorological  technique 

that  yields  the  smallest  dispersion  about  the  simulated  target  is  selected  as 
the  best  of  those  tested. 

The  corresponding  dispersions  are  defined  as  the  group  of  displacements 
calculated  by  the  ballistic  weighting  technique.  Here  an  algorithm  is 
introduced  that  utilizes  the  extended  messages  and  ballistically  computes  a 

displacement  about  a  fixed  target.  This  algorithm  can  be  used  to  compute  the 
deviation  between  the  extended  and  a  standard  (USSA)  method.  This  deviation 
is  then  normalized  with  respect  to  the  standard  (USSA)  condition.  The 

deviation  is  calculated  for  the  averaged  parameter  (wind,  temperature,  density 
and  pressure)  F(Z)  at  each  layer  through  23  km.  Finally,  in  the  ballistic 
technique,  the  normalized  parameter  is  multiplied  by  the  weighted  response 
function  [aw'(Z)].  This  function  contains  the  ballistic  characteristics  of 
the  high  trajectory  weapon  system.  The  required  information  is  the  weighting 
factors  and  the  unit  effect  for  each  of  the  meteorological  parameters  at  the 
identical  layer  structure  of  the  extended  messages  under  evaluation.  The  sum 
of  these  products  through  the  maximum  altitude  of  the  proposed  trajectory 

yields  the  effective  displacement  (D)  from  the  standard  conditions.  In 
reality,  by  knowing  this  displacement,  an  artilleryman  can  compensate  for  the 
meteorological  deviations  from  the  standard  by  appropriately  adjusting  his 
weapon  aim  and  firing  for  effect.  This  displacement  is  formulated  as  follows: 
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ALTITUDE  KM 
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-TEMPERATURE  (2%)  DENSITY  (2%) 


(.)  January  rocketsonde  data 

(X)  January  climatological  data 

(0)  January  4,  1979,  1900  hours,  data 


Figure  1.  Percent  departures  from  the  1962  United  States  Standard  Atmosphere 
16  Rocketsonde  data  flights. 
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D  =  / 
z_ 


6oj  '  ( Z ) 


Rz)  -  Fs(z) 


dZ  , 


where  &  =  unit  effect;  w‘(Z)  =  ballistic  weighting;  dZ  is  the  increment  of 
height;  and  the  parameter  Rz)  is  temperature,  density,  or  wind.  In  the  case 
of  wind  there  is  no  standard,  and  the  Rz)  is  not  normalized. 


A  sample  of  rocketsonde  flights  containing  different  atmospheric 
conditions  yields  a  set  of  impact  displacements  describing  the  dispersion  of 
the  analyzed  weapon  system.  This  dispersion  is  mathematically  represented 
with  a  bias  and  a  variance  for  each  component  (cross  and  range)  about  the 
target.  The  conventional  artillery  practice  is  to  describe  the  dispersion  of 
a  weapon  in  terms  of  a  circular  error  probable  (CEP)  (ref  1).  This  criterion 
is  defined  as  the  circular  radius  of  the  smallest  circle  about  the  target  that 
contains  one-half  of  the  total  impact  displacements.  This  procedure  is  used 
even  though  the  actual  dispersion  of  a  gun  is  elliptical.  In  demonstrating 
the  differences  between  the  evaluated  extending  techniques,  this  report  uses 
elliptical  probable  error  rather  than  the  CEP.  There  are  cases  when  a  small 
dispersion  is  biased  too  far  from  the  target,  thereby  yielding  artillery  fire 
ineffective.  One  is  cautioned  that  when  converting  to  CEP  about  the  target 
the  comparison  of  results  will  produce  a  different  interpretation  of  the 
evaluated  meteorological  messages.  The  bias  due  to  meteorological  parameters 
is  a  major  contributor  to  the  impact  displacement.  In  practice,  through 
observed  fire  the  Fire  Direction  Center  would  correct  for  this  bias  which  is 
caused  from  the  unavailability  of  a  meteorological  message  update  or  lack  of  a 
procedure  to  obtain  data  above  20  km  AGL. 

The  results  show  that  the  dispersion  is  a  function  of  the  atmospheric 
condition.  Wind,  temperature,  and  density  and  pressure  effect  the  range 
impact  displacement,  while  only  wind  effects  the  cross  component  (ref  5). 
Since  the  azimuth  of  fire  determines  the  wind  bias,  calculation  of  a 
mathematical  composite  of  eight  single  azimuth  (e^)  dispersions  was  considered 
to  be  more  appropriate.  The  weapon  system  was  therefore  launched  at  targets 
on  a  circle  of  radius  of  65  km  at  increments  of  45  degrees.  Figure  2 
illustrates  the  one-probable-error  dispersions  produced  from  16  rocketsonde 
flights  collected  during  the  same  month  and  at  the  same  location.  All  impacts 
were  computed  without  an  extended  meteorological  correction  between  20  to  23 
km  AGL.  The  effectiveness  of  fire  is  different  for  the  particular  target. 
This  paper  groups  the  128  impact  displacements  and  defines  the  composite 
dispersion  plotted  in  the  center  of  figure  2.  Notice  that  the  range  and  cross 
bias  due  to  the  wind  are  cancelled  in  the  composite  dispersion.  This 
cancellation  would  also  be  true  for  a  single  azimuth  target  if  the  sample 
rocketsonde  data  included  winds  from  all  directions.  The  temperature  and 
density  and  pressure  bias  are  not  cancelled  because  of  the  nature  of  the 
ballistic  computation.  If  the  sample  contained  data  with  the  temperature  and 
density  and  pressure  above  and  below  the  standard,  then  the  bias  would  be 
effected  in  the  composite.  The  next  section  will  present  results  for  the 
rocketsonde  data  collected  during  different  months  illustrating  the  variation 
of  temperature  and  density  and  pressure  effects.  The  composite  results  can  be 
interpreted  as  results  of  a  large  sample  containing  128  rocketsonde  flights 
collected  on  the  same  month.  With  the  inclusion  of  several  months  of  data 
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NORTH 


GRID  =50  M 


Figure  2.  One  probable  error  elliptical  dispersions  from  16  impact  displace¬ 
ments  computed  without  extended  meteorological  correction  between 
24  and  26  km  above  mean  sea  level.  The  weapon  system  is  fired  at 
targets  on  a  circle  of  radius  65  km  at  45-degree  increments.  The 
center  dispersion  is  the  mathematical  composite  of  the  128  dis¬ 
placements. 
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collected  at  one  location  and  following  the  outlined  procedure,  the  final 
results  can  be  interpreted  for  general  application. 


For  each  rocketsonde  flight,  equation  (1)  is  applied  to  the  cross  (Dc)  and 
range  (Dr)  components  as  follows: 


Dc  (e  j ,  Z)  =  6C  z  “q(Z)Wc  (Oj,  Z)  ; 
i  Zj  i 

(ej,  z)=6r  ! i“R(z)gR  ..(0j’  z)  +  6t  i  “T(Z)  T r  +  %  \  “p{Z)^l 


R1  •  J 


(2) 


(3) 


The  cross  component  does  not  contain  the  temperature  (T)  and  density  (p  ) 
effects  as  illustrated  in  equation  (3).  The  displacement  statistics  for  the 
error  due  to  the  unextended  meteorological  message  are  computed  as  follows: 


n  8 

Bias  =  z  z  D. (e  . ,  Z)/8n  ; 
i  j  3 
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8"  i  j 
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(5) 

Generalizing  the  results,  consider  that  for  each  (i)  rocketsonde  flight 
there  are  (j  =  8)  azimuths  providing  a  total  of  8n  impact  displacements  for 
each  month. 

4.  TECHNIQUE  COMPARISON.  Measurements  from  69  rocketsonde  data  flights 
collected  at  White  Sands  Missile  Range  (WSMR),  New  Mexico,  during  January 
through  June  1979  were  used  to  compute  the  meteorological  displacement  for  the 
high  trajectory  projectile  at  the  simulated  65  km  range  target.  Since  the 
evaluation  of  the  three  proposed  extending  techniques  is  based  on  the 
comparison  of  the  dispersion  from  the  simulated  displacements,  the  formula  in 
equation  (1)  is  computed  for  heights  of  20  through  23  km  AGL.  These 
computations  represent  the  meteorological  effects  which  are  not  compensated 
for  when  the  selected  projectile  is  fired.  However,  use  of  extending 
meteorological  data  techniques  will  provide  meteorological  compensation  for 
the  missing  data  and  should  improve  the  accuracy. 

The  meteorological  effect  from  surface  through  20  km  AGL  is  not  computed 
in  this  report  since  the  first  20  km  of  data  are  the  same  for  each  of  the 
three  extended  meteorological  messages.  The  extended  meteorological  message 
that  best  estimates  the  rocketsonde  data  will  yield  the  smallest  dispersion 
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about  the  target  at  65  km  range.  Only  the  displacement  due  to  meteorology 
above  20  km  is  analyzed.  The  largest  displacement  Is  164  m  and  the  smallest 
Ts  19  nu  Note  that  this  study  assumes  that  there  Is  no  time  and  space 
difference  between  the  point  of  measurement  and  application.  For  an  actual 
firing,  these  errors  are  further  increased  by  a  factor  determined  from  the 
time  and  space  variability. 

The  results  indicate  that  the  presently  used  default  values  for  the 
meteorological  message  above  20  km  AGL  yield  large  displacement  variations 
that  the  Field  Artillery  should  be  correcting. 

The  miss  distance  is  computed  for  an  extrapolation  defined  from  the  last 
available  data  estimating  the  missing  three  meteorological  layers.  Another 
miss  distance  computed  is  that  represented  by  persistent  meteorology  modified 
by  climatological  gradients.  The  following  interpretation  can  be  made:  If 
the  high  trajectory  projectile  were  fired  on  a  cross-road  target  located  65  km 
in  range  on  4  January  1979,  1900  hours,  using  the  current  artillery  default 
method,  it  would  miss  the  target  by  164  m.  The  smallest  miss  for  the  month  Is 
50  m  (17  January)  and  this  assumes  that  there  are  no  other  time  and  space 
associated  meteorological  contributions.  This  unacceptable  error  can  be 
improved  significantly  by  any  of  the  proposed  extending  techniques.  By  the 
simple  extrapolated  technique,  the  164-m  miss  is  reduced  to  37  m  and  the  50-m 
miss  to  17  m.  A  statistical  extrapolation  technique  may  provide  further 
improvement.  This  improvement  is  expected  from  the  better  estimate  of  the 
wind  and  density  effect.  The  temperature  related  errors  are  small  because  the 
variations  at  23  to  26  km  (above  mean  sea  level)  were  small;  and  when 
normalized  with  the  standard  (in  degrees  Kelvin),  the  ballistic  effect  is  a 
minimum. 

In  this  study,  a  procedure  is  automated  to  compute  the  expected 
meteorological  errors  associated  with  the  high  trajectory  profile.  The 
algorithm  compares  the  statistics  from  the  evaluated  techniques.  In  summary, 
the  no-correction  or  the  default  displacement  is  computed  first  by  setting  J  = 
0.  This  error  is  the  total  effect  of  the  extended  layers  as  computed  from  the 
actual  rocketsonde  data.  For  each  flight,  this  displacement  is  saved  for 
comparison  with  the  other  evaluated  techniques.  The  difference  and  square  of 
difference  are  saved  to  compute  statistics  leading  to  description  of  the  one 
probable  error,  elliptical  dispersion.  In  detail  the  miss  distance  is 
defined,  using  no-correction  or  default  standard,  as  JO  (CO,  RO).  E(C1,  Rl) 
is  the  miss  distance  computed  by  using  the  extrapolated  correction.  The  miss 
distance  provided  from  the  climatology  method  is  Tabled  as  E(C2,  R2).  The 
differences  J1  and  J2,  where 


J1  =  E(C1,  Rl)  -  J0(C0,  RO), 
J2  =  E(C2,  R2)  -  J0(C0,  RO), 


(6) 


provide  the  comparative  values  for  the  evaluated  methods.  A  difference  equal 
to  zero  indicates  that  the  extended  method  has  fully  compensated  for  the 
actual  extended  values.  The  value  of  the  difference  is  the  error  that  remains 
uncompensated.  By  grouping  the  corresponding  displacements,  one  can  then 
compare  the  evaluated  technique  dispersions. 
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Table  1  presents  statistics  partitioned  into  the  January,  February,  March, 
April,  May,  and  June  subsets  of  16,  13,  12,  12,  8,  and  8  rocketsonde  data 
flights.  An  analysis  of  the  total  sample  shows  that  there  is  a  64  percent 
improvement  afforded  by  the  extrapolated  method  over  the  current  default 
method.  Figure  3  presents  a  graphic  demonstration  of  improved  accuracy. 

To  assure  the  reader  that  this  sample  provides  representative  results,  a 
test  of  significance  was  performed.  The  chi  square  distribution  test  involves 
the  comparison  of  the  computed  displacements  versus  the  expected 
displacements.  A  desired  risk  is  selected,  and  a  test  statistic  is  compared 
with  the  chi  square  table  value  (ref  6).  This  test  statistic  is  defined  as 
f ol 1 ows : 


n 

z 

1 


(7) 


where  CL-  is  the  observed  frequency  of  occurrence  of  the  computed 
displacements,  Ej  is  the  expected  frequency  of  displacement  for  the  different 
technique,  and  x*  is  the  computed  chi  square  value. 

For  ease  in  organizing  the  results,  a  contingency  table  is  arranged  in 
table  2.  The  expected  number  of  less  than  30  m  displacement  is  computed  as 
follows:  If  there  were  no  difference  in  the  effect  of  the  three  techniques, 
the  fraction  of  displacement  with  better  than  30  m  would  be  expected  to  be  the 
same  ratio  as  the  totals  in  the  last  column  of  table  2.  The  number  of  the 
sample  displacements  is  multiplied  by  this  ratio  to  define  the  expected 
results.  The  computed  value  of  x2  is  greater  than  34.  Since  the  calculated 
value  exceeds  the  table  value  (10),  the  conclusion  is  that  the  data  indicate  a 
difference  from  the  expected  value  with  a  risk  less  than  0.005. 

5.  CONCLUSIONS.  There  is  a  large  variation  in  the  displacement  effect 
due  to  the  measured  rocketsonde  data  collected  at  23  through  26  km  above  mean 
sea  level.  For  this  theoretical  study,  the  largest  meteorological 
displacement  in  the  sample  size  of  69  is  164  m  and  the  smallest  is  19  m.  Note 
that  for  an  actual  firing  these  errors  are  further  increased  by  a  factor 
determined  from  the  time  and  space  differences  between  the  point  of 
measurement  and  application  of  the  meteorological  data.  Under  the  assumption 
of  no  time  and  space  variability,  extrapolated  meteorological  data  above  20  km 
AGL  yielded  a  significant  improvement  over  the  current  default  method  of  using 
a  standard  meteorological  message.  The  total  rms  78-m  displacement  error  was 
reduced  to  28  m.  The  comparison  reduces  to  how  well  the  actual  meteorological 
profile  can  be  estimated  from  available  information.  If  the  estimate  is  poor, 
then  actual  measurements  become  important.  Preliminary  results  for  the  high 
trajectory  projectile  considered  indicate  that  a  software  addition  to  the 
current  message  procedure  may  be  sufficient.  This  indication  appears  to  be 
true  when  the  meteorological  message  is  extended  3  km  in  altitude  for 
compensating  meteorological  effects  on  a  65  km  range  trajectory. 
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Figure  3.  Graphic  display  of  improved  one  probable  error  afforded  by  the 
extrapolated  and  climatological  messages. 
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TABLE  1.  COMPARISON  OF  RMS  MISS  FOR  THREE  EXTRAPOLATED  MET  MESSAGES 
USED  AS  INPUT  FOR  SIMULATED  TRAJECTORY 


(M2  +  in  meters 


Month 

Jan 

Feb 

Mar 

Apr 

May 

Jun 

Total 

Sample  size 

16 

13 

12 

12 

8 

8 

69 

Rocketsonde 

Ar  +  n  a 1  imnarf _ . _ . 

- — 

uu  i  i 

Techniques  (20-23  km) 

Default  standard 

106 

55 

58 

64 

74 

89 

78 

Extrapolated 

36 

25 

30 

27 

12 

20 

28 

Climatology 

43 

39 

46 

27 

18 

27 

37 

TABLE  2.  CONTINGENCY  TABLE  BASED  ON  RESULTS  OF  TEST  OF 
THREE  EXTENDING  METEOROLOGICAL  TECHNIQUES 


Technique  J  = 

0 

1 

2 

Total 

Total  displacement 

69 

69 

69 

207 

_<  30  m  criteria 

5 

50 

37 

92 

Expected  improvement 

30.7 

30.7 

30.7 
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The  next  report  to  the  United  States  Field  Artillery  School  will  present 
the  status  on  the  accuracy  and  dispersion  effects  on  target  impact 
displacement  provided  by  using  statistical  extrapolation  techniques.  The 
improvement  expected  originates  from  bounded  physical  estimates  of  density  and 
temperature  effects  and  modified  persistent  winds  with  expected  wind  gradient 
effects.  Instead  of  climatology,  the  last  available  fallout  message  can  be 
used  to  provide  the  trend  of  the  missing  data.  A  more  representative  case  of 
a  high  trajectory  traversing  to  the  middle  stratosphere  will  be 
investigated.  Under  this  condition,  the  default  method  of  using  the  standard 
meteorological  message  is  expected  to  yield  increasingly  larger  errors. 

6.  SUMMARY .  The  United  States  Army  Field  Artillery  School  needs  to  know 
the  expected  meteorological  impact  displacement  for  new  weapons  traversing  the 
atmosphere  to  altitudes  where  measurements  are  not  available  from  the 
meteorological  field  units.  The  preliminary  status  is  that  simple  persistence 
for  extrapolating  the  wind,  extending  temperature  by  adding  the  standard 
gradient  to  the  last  known  temperature  value,  and  using  the  hydrostatic 
extrapolation  of  density  and  pressure  significantly  reduces  the  meteorological 
impact  error.  The  improvement  is  summarized  as  allowing  all  impacts  to  locate 
within  the  current  one  probable  error  dispersion.  A  software  addition  to  the 
current  meteorological  message  procedure  reduces  the  error  when  the  message  is 
extended  3  km  in  altitude. 
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FINITE  CYLINDERS  AND  PLATES  BY  COMPUTER 
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Natick,  Massachusetts  01760 


ABSTRACT 


The  temperature  distribution  in  a  specimen  during  heating  and  cooling 
is  determined  by  the  heat  diffusion  equation  with  appropriate  boundary 
conditions.  In  convective  heating,  the  eigenvalues  for  the  heat  diffusion 
equation,  for  cylinders  and  plates,  are  given  by  the  roots  of  either  or 
both  of  the  following  equations: 

xn  Ji  (xn)  =  v  ’  Jo  (xn> 

C  cos  yj  =  yj  sin  y,. 

The  present  paper  describes  the  solution  of  these  equations  with  com¬ 
puter  to  their  36th  roots  for  parameter  v  extending  from  0  to  20,000,  and 
C,  from  0  to  7,000,  with  rapid  convergence.  The  parameter  V  corresponds 
to  the  "conductive  Nusselt  number",  or  the  Biot  number,  in  convective 
heating. 

The  very  rapid  and  accurate  calculations  of  the  eigenvalues  for  the 
very  wide  ranges  of  v  and  C  covered,  have  great  applicability  in  many 
fields  of  thermal  engineering.  In  the  present  case,  these  very  rapid 
calculations  have  made  the  evaluation  of  the  microbial  kill  practicable 
for  food  samples  of  various  sizes  under  different  heating  conditions. 
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COMPUTER-AIDED  SOLUTION  OF  THE  BACTERIAL  SURVIVAL  EQUATIONS 
IN  MICROBIOLOGY  II.  -  EIGENVALUES  FOR  HEAT  DIFFUSION  EQUATION  FOR 
FINITE-CYLINDERS  AND  PLATES  BY  COMPUTER 

Chia  Ping  Wang  and  Arl  Brynjolfsson 


I.  INTRODUCTION 

The  task  of  finding  changes  with  time  of  the  heat  distribution  In 
samples  heated  from  outside  Is  an  important  engineering  problem  in  the 
industry.  It  is  important  in  the  thermal  food  canning  industry,  It  Is 
important  in  the  nuclear  power  generation  plants  and  many  other  fields. 

The  thermal  diffusion  equation,  with  given  boundary  condition,  can  be 
solved  and  the  results  are  usually  reported  in  tables.  Graphical  methods 
are  then  used  to  interpolate  and  extrapolate  the  tabulated  values.  When 
the  effect  of  the  time  temperature  relationship  on  different  systems  is 
to  be  estimated,  the  calculations  become  very  tedious  and  time-consuming. 

For  instance,  when  the  time  temperature  relation  is  to  be  used  for 
estimating  the  microbial  kill  in  the  canning  industry,  the  standard 
procedures  are  very  tedious,  and  for  all  practical  purposes,  make  study 
of  the  effect  of  small  variation  in  the  different  experimental  parameters 
too  difficult  to  estimate  accurately.  In  previous  papers  [1»3,4]  we 
have  shown  methods  for  computer  calculation  of  the  diffusion  equation  and 
we  have  shown,  as  an  example,  how  these  could  be  used  directly  to  calculate 
the  microbial  kill  in  the  canning  Industry.  We  now  report  on  a  method 
which  shortens  the  computer  time  for  these  calculations  from  one  hour 
to  one  minute  for  each  sample,  and  still  retains  high  degree  of  accuracy. 
H.  BASIC  EQUATIONS 

The  temperature  distribution  T(x,y,z,t)  In  a  specimen  during  heating 
and  cooling  of  its  outer  surface  is  determined  by  the  heat  diffusion  equation 


dT  *  K 
Ht  * 


v2T 


with  appropriate  boundary  conditions. 


(1) 

In -paper  I  [1],  it  was  shown 
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that  for  an  Infinitely  long  cylinder  in  convective  heating,  the  boundary 

condition  Is  given  by  the  equation 

=  Ji  e,  at  r  *  a  (2) 

3r  A 

where  e  =  T  -  Th,  and  is  the  heating  temperature, 
h  =  coefficient  of  heat  transfer  at  the  surface 
A  =  thermal  conductivity  of  the  cylinder  material 
a  =  radius  of  the  cylinder 

r  =  radius  vector  of  the  cylindrical  coordinates  used. 

Eqs.  (1)  and  (2)  lead  to 

xn  W  '  7a  '  Jo  <xn>  (3) 

or  xn  Jx(xn)  '  v  0o  (xn) 

where  (h/A)  a  =  v,  and  0o  and  are  the  Bessel  functions  of  order  zero  and  one 
respectively.  The  eigenvalues  for  Eq.  (1),  with  the  boundary  condition  Eq.  (2) 
are 

an  =  xn/a  ^ 

Thus,  the  eigenvalues  in  this  case  are  the  roots  of  Eq.  (3)  divided  by 

the  radius  a. 

In  Eq.  (3) , 


is  the  "conduction  Nusselt  number"  or  the  Biot  number  [2], 

We  note  that  h/A  in  the  constant  v  comes  from  the  boundary  condition 
Eq.  (2)  and  includes  the  effects  of  all  surface  heat  resistances  and  surface 
radiation  exchanges. 

With  these  notations,  the  temperature  e  *  T  -  T^  is  obtained  by 
solving  the  heat  diffusion  equation.  For  =  T^  -  T^  where  T^  is  the 
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initial  temperature  of  the  cylinder,  we  get  [1,  3]: 


6  *=  2^1  l 


.*an2t 


1  n=l  xn[(xn/v)2  +  1]  ■  Jj(xn) 


(6) 


which  gives  the  temperature  e  =  T  -  Th  for  any  point  at  a  distance  r  from 
the  axis  of  the  infinite  cylinder. 

We  will  then  consider  a  finite  cylinder  of  half-length  £.  In 
addition  to  the  boundary  condition  Eq.  (2)  for  the  curved  surface,  we 
have  the  following  boundary  conditions  at  the  ends  perpendicular  to  the 
z  axis  [1,4,5] 


+  il+!le  =  0atzs!  +  £ 

“*  az  x  - 

The  general  solution  of  Eq.  (1),  with  the  boundary  conditions. 


(7) 


Eqs.  (2)  and  (7),  can  be  obtained  by  the  usual  technique  of  separation  of 
variables  [1]  and  [4],  The  conditions,  Eq.  (7)  lead  to: 
hi 

cos  A  -j  £  -  A  i  £  Sin  A^£  *  U 

(8) 


A  COS  Xj£ 


Aj£  sin  Aj£  *  0 


or  yj  sin  yj  =  C  cos  yj 


where  C  =  h£/x  (9) 

is  a  constant  corresponding  to  the  conduction  Nusselt  number  v  for  the 
curved  surface,  and 


x j  “  y j/ i  ( 10) 

are  the  eigenvalues  of  the  separated  z  equation  of  the  heat  diffusion 
equation. 

The  methods  for  solving  the  heat  diffusion  equation  (1)  in  case  of 
finite  cylinder  can  be  applied  directly  to  solve  heat  distribution  In 
samples  having  other  forms.  For  an  infinite  plate  of  half-thickness  i 
along  the  z  direction,  the  eigenvalues  are  again  the  Xj  of  Eq.  (8).  For  a 
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rectangular  parallelepipeds,  we  have  three  separate  sets  of  Aj's  for  the 
three  coordinates  x,y  and  z,  which  have  to  be  calculated  separately. 

In  Section  III  below,  we  give  a  method  to  solve  Eq.  (3)  to  its 
36th  root  by  computer,  for  values  of  v  extending  from  0  to  20,000, with 
rapid  convergence.  In  Section  IV,  we  used  a  similar  method  to  solve  Eq.  (8) 
to  its  36th  root  by  computer  for  values  of  C  extending  from  0  to  7,000. 

The  computation  time  to  their  36th  roots  for  each  value  of  v  or  C  on 
UNIVAC  1106  is  about  1  to  2  seconds  depending  on  the  number  of  iterations. 
III.  SOLUTION  OF  XnJjUp)  =  v  •  J0(xn) 

We  rewrite  Eq.  (3)  as 

f (x)  =  x  •  Jj(x)  -  V  •  J0(x)  =  0  (11) 

and  use  Newton- Raphson 's  method  of  iteration  [6]  to  find  the  successive 
roots  of  f(x).  In  order  to  carry  out  such  calculation,  one  suitable 
approximate  value  of  x  must  be  assigned  to  each  root.  This  is  accomplished 
by  observing  the  asymptotic  expressions  for  JQ(x)  and  Jx(x): 


j„<*> 


Jx(x) 


cos  (x  -  tt/4) 


4 


n 

9  X 


sin  (x  -  tt/4) 


(12) 


(13) 


From  Eqs.  (12)  and  (13)  we  obtain  the  following  asymptotic  expression 

t 

for  Eq.  (11) , 

x  tan  (x  -  2-)  -  v  =  0  (11a) 

4 

Let  us  introduce  6  such  that 

x  =  nn  +  i  +  6  (14) 

4 
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Eq.  (11a)  becomes 


(nir  +  J-  +  6)  *  tan  6  =  v  or  tan  5  = 


which  for  large  x  or  n  can  be  approximated  by 


6  % 


nir  +  J  +  6 


(15) 


n  TT  +  BtT 

where  we  have  set 

}  +  6  =  Bit 
From  Eq .  ( 14 )  x  £  nir  +  2L  +  _ )L 


(16) 


(17) 


4  nir  +  6tt  (18) 

It  was  found  by  actual  computer  calculation  that  xn  given  by  Eq.  (18) 
can  be  used  as  the  initial  value  in  the  iteration,  not  only  for  obtaining  the 
first  root  x0  with  the  assigned  accuracy  with  rapid  convergence,  but  also  for 
obtaining  all  the  first  36  roots  within  the  assigned  accuracy  with  equally 
rapid  convergence.  Moreover,  the  same  initial  value  in  the  iteration  could 
be  used  to  calculate  the  36  roots  for  a  large  range  of  v-values.  We  thus 
express  the  parameter  $  in  terms  of  the  first  root  x0  using  Eq.  (18) 

v 


B  = 


(19) 


(20) 


and  treat  xQ  as  a  parameter  after  substituting  Eq.  (19)  Into  Eq.  (16) 

6*  -  - r 

*»  +  ',/<xo  -  p 

For  the  initial  values  of  x  we  then  use: 

x  *  mr  +  -  +  - .J! - -  (21) 

4  h*  +  v/(x0-ir/4)  v 

for  the  first  36  roots,  for  the  given  range  of  v. 
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For  the  different  ranges  of  v,  we  let  *0  assume  different  values 
(as  a  parameter),  and  obtain  thus  the  first  36  roots  of  x  for  v  extending 
from  0  to  20,000  with  rapid  convergences  as  mentioned  before. 

Table  I  shows,  as  examples,  the  36  roots  so  computed  for  v  =  5, 
100  and  20,000.  The  values  of  v  are  given  in  the  upper  left  corner  of 
each  data  block.  The  error  in  the  calculation  is  less  than  5x10"^  for 
v  <  100,  and  10“^  for  10^  <  v  <_  10^. 

IV.  SOLUTION  OF  yj  sin  =  C  cos  y^ 


We  rewrite  Eq.  (8)  as 


f(x)  =  x  tan  x  -C  =  0 

For  the  (n+l)th  root  we  write 

(22) 

x  -  nn  +  S 

(23) 

then 

tan  x  =  tan  6 

(24) 

and 

(nir  +  6 )  tan  6  =  C 

tan  6  =  C--  ■ 

mr  +  6 

(25) 

which  for  large  n  approaches 
X  _  C 

6  "  mT+T  (25a) 

The  successive  roots  will  be  given  by  different  values  of  n  and  6: 


x  =  nit  +  nFTT 


(26) 


Using  Newton-Raphson 's  method  of  iteration,  we  need  only  an  approxi¬ 
mate  initial  value  of  x.  This,  in  turn,  means  that  we  only  need  an  approximate 
value  for  6  in  the  denominator  of  Eqs.  (25)  or  (26).  Obviously,  6  is  less 
than  u/2.  We  thus  write  for  the  approximate  value  of  6  as 


6  % 


nit  +  Pit 

where  B  is  a  parameter  which  we  introduce  and  whose  values  are  less  than  1/2. 


(27) 
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(28) 


The  approximate  value  of  the  first  root  x0  Is 

V  s  *  (£ 

and  we  express  Eq.  (27)  as 

6  % - £ — 

mi  +  — 
xo 

and  treat  x6  in  Eq.  (29)  as  a  parameter  instead  of  $. 

The  initial  value  of  x  is  given  by  Eq.  (23) 

x  =  nTT  +  nV  -+-Cr/x" 

using  the  approximate  value  of  6  given  by  Eq.  (29). 

As  in  Section  III,  it  is  found  by  actual  computer  calculations  that 
for  a  given  range  of  values  of  C,  a  single  value  of  x0  for  Eq.  (29)  is 
sufficient  to  compute  all  the  first  36  roots  of  Eq.  (22)  with  rapid  con¬ 
vergence.  The  range  of  the  values  of  C  tested  extends  from  0  to  7000. 

Table  II  shows,  as  examples,  the  36  roots  so  computed  for  C  =  5,  100 
and  7000.  The  values  of  C  are  given  in  the  upper  left  corner  of  each  data 
block.  The  error  in  the  calculation  is  set  =  10“®,  though  the  roots  are 
printed  out  only  to  the  4th  decimal  place. 

V.  CONCLUDING  REMARKS 

The  computer  programs  developed  here  make  use  of  the  NEWTIT  sub¬ 
routine  for  Newton-Raphson 's  method  of  iteration,  and  the  BSSL  subroutine 
for  the  calculation  of  the  Bessel  functions  J0  and  Jj.  These  subroutines, 
in  the  present  case,  are  already  in  the  UNIVAC  system.  Self-checking  that 
the  roots  calculated  fall  in  the  correct  quadrant  in  each  Iteration  is  also 
provided  in  the  programs. 


(29) 


(23) 
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Before  concluding,  we  give  the  temperature  distribution  so  calculated 
for  a  finite  cylinder  in  Tables  III  and  IV.  This  Is  expressed  in  terms  of 
the  "relative  temperature  difference"  [5] 


=  T  -  Th 


Fig.  1  is  the  3  dimensional  plot  of  i|»  for  such  a  cylinder  given  In  Tables 
III  and  IV. 
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TABLE  I.  THE  FIRST  36  ROOTS,  xn,  OF 

xn  Jl(xn)  =  v  *  Jo(xn) 
COMPUTED  FOR  5, 100,  AND  20,000 


NU  -  5.0000 


i .989b 

4. 7131 

7.6177 

10.6223 

13.o78b 

16.7630 

19.6640 

22.9754 

20.0957 

29.2168 

32.3434 

35.4726 

38.6036 

41.7365 

44.8704 

48.0054 

51.1411 

54.2775 

57.4145 

60,5519 

b5 .689b 

66.8279 

69.9664 

73.1052 

7b. 2442 

79.3834 

82.5228 

85.6623 

88.8020 

91.9418 

95.0818 

98.2218 

101.5620 

104.5022 

107.6425 

110.7829 

100.0000 

2.3809 

5.4652 

14.7834 

17.8931 

27 • 2264 

30.3387 

39.0790 

42.7936 

52.1414 

55.2586 

64.6140 

67.7338 

77.0968 

80.2190 

69.5891 

92.7136 

102.0901 

105.2166 

8.5676 

11.6747 

21.0036 

24.1147 

33.4515 

36.5649 

45.9089 

49.0248 

58.3764 

61.4949 

70.8542 

73.9752 

83.3418 

86.4652 

95.8386 

98.9641 

108.3436 

111.4710 

NU  z  <10000.0000 


2.4047 

5.5198 

8.6533 

11.7909 

14.9302 

18.0702 

21-2106 

24,3513 

27.4921 

30.6331 

33.7741 

36.9153 

40.0564 

43.1976 

46.3389 

49.4801 

52.0214 

55.7627 

58.9040 

62.0454 

65. 1667 

68.3281 

71.4694 

74.6108 

77.7521 

80.8935 

84.0349 

87.1763 

90.3177 

93.4590 

96.6004 

99.7418 

102.6832 

106.0246 

109.1660 

112.3074 
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TABLE  II.  THE  FIRST  36  ROOTS,  x„,  OF 
xn  sin  x„  =  C  •  cos  x„ 

COMPUTED  FOR  C  =  5, 100,  AND  7,000 


C  =  5  .  U  0  0  U 


1,513b 

4.U336 

6.909b 

9.8928 

12  .935*: 

16.0107 

19.1055 

22.2126 

kb  *  527b 

28  *  4463 

31 .5730 

34.7006 

37 ,  o3U5 

40.9622 

44.0952 

47.2294 

bO • 3644 

53.5003 

56.6367 

59.7737 

62,9112 

66.0490 

69.1072 

72.3257 

75.4644 

78.6033 

81.7425 

84.8818 

ob .  U215 

91.1610 

94.3006 

97.4406 

loo.baob 

103.7207 

106.8609 

110.0012 

100 . 0000 

1 .5552 

4.6658 

7.7764 

10.8871 

13.9961 

17.1093 

20.2208 

23.3327 

26.4450 

29.5577 

32.6709 

35.7847 

36.6984 

42.0138 

45.1292 

48.2452 

51.3616 

54.4790 

57.5969 

60.7154 

o3. 63*45 

66.9543 

70.0746 

73.1956 

76.3171 

79.4393 

82.5620 

65.6853 

66.6092 

91.9336 

95.0585 

98.1839 

101.3U99 

104.4363 

107.5631 

110.6904 

7UOU.OOOO 

1.570b 

4.7117 

7.8529 

10.9940 

14.1351 

17.2763 

20.4174 

23.5586 

2o.6997 

29.8409 

32.9820 

36.1232 

59.2643 

42.4054 

45.5*466 

48.6877 

51.8289 

54.9700 

58.1112 

61.2523 

b4 . 3935 

67.5346 

70.6757 

73.8169 

76.9560 

80.0992 

33.2403 

86.3815 

69 . 522b 

92.bo37 

95.0049 

98.9460 

102.11872 

105.2283 

108.3695 

111.5106 

258 


TABLES  l»  &  IV-  RELATIVE  TEMPERATURE  DIFFERENCES,  ,\f/  =  {T-Th)/(TrTh)  IN  TWO  CROSS  SECTIONS  AT  Z//  =  0  AND  0.6. 
THE  FIRST  COLUMN  GIVES  THE  TIME,  THE  COLUMNS  2-7  GIVE  \\f  FOR  DIFFERENT  DISTANCES  r/a  =  0,  0.2,  0.4,  0.6,  0.8,  1.0, 
AND  COLUMN  8  GIVES  THE  SURVIVAL  FRACTION  FOR  C/.  BOTULINUM  SPORES  AT  r/a  =  0. 
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ALGORITHMS  FOR  SPARSE,  SYMMETRIC,  DEFINITE 
QUADRATIC  A-MATRIX  EIGENPROBLEMS 


David  S.  Scott  and  Robert  C.  Ward 
Computer  Sciences  Division 
Union  Carbide  Corporation  -  Nuclear  Division 
Oak  Ridge,  Tennessee  37830 


ABSTRACT.  Methods  are  presented  for  computing  eigenpairs  of  the 
quadratic  A-matrix,  MA2  +  CA  +  K,  where  M,  C,  and  K  are  large  and 
sparse,  and  have  special  symmetry-type  properties.  These  properties 
are  sufficient  to  insure  that  all  the  eigenvalues  are  real  and  that 
theory  analogous  to  the  standard  symmetric  eigenproblem  exists.  The 
methods  employ  some  standard  techniques  such  as  partial  tri -di agonal- 
ization  via  the  Lanczos  Method  and  subsequent  eigenpair  calculation, 
shift-and-invert  strategy  and  subspace  iteration.  The  methods  also 
employ  some  new  techniques  such  as  Rayleigh-Ritz  quadratic  roots  and 
the  inertia  of  symmetric,  definite,  quadratic  A-matrices. 

1.  INTRODUCTION.  Quadratic  A-matrix  problems  consist  of  deter¬ 
mini  n g_^caTars~A7~caT  1 ed  eigenvalues,  and  correspondi nq  n  x  1  nonzero 
vectors  x,  called  eigenvectors,  such  that  the  equation 


(MA2  +  CA  +  K)x  =  0 


(1) 


is  satisfied,  where  M,  C,  and  K  are  given  nxn  matrices.  In 
we  assume  that  M,  C,  and  K  are  symmetric  or  Hermitian,  M  is 
(either  positive  or  negative  definite),  and  the  eigenvalues 
real  and  can  be  divided  into  two  disjoint  sets  P  and  S  with 
ing  properties: 


addition , 
definite 
of  (1)  are 
the  fol low- 


Pl)  If  A,  e  P  and  A-  e  S,  then  A.  >  A.. 

'  J  1  J 

P2)  If  A^  eP  (5)  and  xi  is  its  associated  eigenvector,  then  A^ 
is  the  larger  (smaller)  root  of  the  quadratic  equation 

(xf  Mxi )  A2  +  (x*  C  xi )  A  +  (x*  K  xi )  =  0. 


The  eigenvalues  in  p  will  be  called  primary  eigenvalues,  and  those  in  S 
will  be  called  secondary.  Their  eigenvectors  will  be  referenced 
simi larly. 

Problems  of  this  nature  occur  in  several  application  areas;  we 
will  briefly  discuss  two  of  them.  Lancaster  [2]  states  that  the  deter¬ 
mination  of  sinusoidal  solutions  to  the  equations  of  motion  for 
vibrating  systems  which  are  heavily  damped  results  in  such  a  quadratic 
A-matrix  problem.  In  these  overdamped  systems  M,  C,  and  K  are 
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symmetric,  M  and  C  are  positive  definite,  K  is  non-negative  definite, 
and  the  overdampinq  condition 

(y*Cy)2  -  4(y*My)(y*Ky)  >  0 

is  satisfied  for  all  vectors  y  *  0.  Proof  that  the  eigenvalues  for 
overdamped  systems  are  all  real  and  obey  properties  PI  and  P2  above  can 
be  found  in  Lancaster  [2].  Problem  (1)  also  arises  in  the  dynamic 
analysis  of  rotating  structures  where  the  gyroscopic  effects  cannot  be 
ignored.  (See  Wildheim  [81  and  Lancaster  [2].)  In  gyroscopic  systems 
M,  C,  and  K  are  symmetric  (Hermitian),  M  is  negative  definite,  and  K  is 
positive  definite.  One  can  determine  (Scott  and  Ward  [7])  that  all  the 
eigenvalues  are  real,  that  P  and  S  are  the  positive  and  negative 
eigenvalues,  respectively,  and  that  properties  PI  and  P2  are  satisfied. 
In  both  overdamped  and  gyroscopic  systems,  the  M  matrix  is  usually 
called  the  mass  matrix  and  K  the  stiffness  matrix.  Thus,  we  have 
chosen  the  notation  given  in  (1)  rather  than  the  more  standard  mathe¬ 
matical  notation  using  A,  B,  and  C  for  the  matrices. 

In  this  paper  we  present  various  methods  for  computing  eigenpairs 
of  these  quadratic  A-matrices  when  M,  C,  and  K  are  also  large  and 
sparse.  Due  to  the  simplicity  of  the  properties  of  gyroscopic  systems, 
our  model  problem  for  presentation  of  the  methods  will  be  from  this 
application  area.  That  is,  we  will  discuss  algorithms  for  computing 
eigenpairs  of  equation  (1)  where  M,  C,  and  K  are  large,  sparse,  and 
symmetric,  M  is  negative  definite,  and  K  is  positive  definite. 

In  Section  2  we  discuss  the  approach  of  transforming  the  quadratic 
problem  into  a  linear  one.  Some  methods  based  on  the  factorization  of 
a  nxn  matrix  are  presented  in  Section  3  with  methods  not  requiring  any 
factorization  presented  in  Section  4.  We  close  the  paper  by  summa¬ 
rizing  our  results. 

2.  LINEARIZATION.  It  may  be  immediately  verified  that  the 
eigenpair  (A,  x)  satisfies  the  quadratic  problem  (1)  if  and  only  if  it 
also  satisfies  the  2n  x  2n  linear  problem 


which  we  denote  as  (A-AB)z  =  0.  By  the  hypotheses  on  M,  C,  and  K,  A 
and  B  are  symmetric  and  B  is  positive  definite.  Thus  from  well  known 
linear  theory,  there  are  2n  real  eigenvalues.  Applying  the  Cauchy 
interlace  theorem  to  the  nxn  zero  block  of  A  leads  to  the  conclusion 
that  exactly  n  of  the  eigenvalues  are  positive  and  n  are  negative. 
Finally,  the  eigenvectors  of  the  linear  problem  are  B  orthogonal  so 
that  if  (al,  Xj)  and  (A?,  x2)  are  different  eigenpairs,  then 

★  ★ 

(xA  K  x2J  -  Ax  A 2  (x1  M  x2)  =  0*  (3) 
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Unfortunately,  equation  (3)  involves  both  Xl  and  A2  and  does  not  lead 
to  a  useful  deflation  technique. 

Sparse  linear  eigenvalue  problems  have  been  studied  in  some  detail 
and  good  solution  techniques  exist.  However,  a  general  linear  solver 
may  not  be  the  best  choice  for  solving  a  quadratic  problem  in  that  the 
linear  problem  has  dimension  2n  even  though  the  original  problem  has 
dimension  n  and  no  advantage  will  be  taken  of  the  special  structure  of 
A  end  B.  Also,  A-oB  is  not  banded  even  if  M,  C,  and  K  are  so  that 
factoring  A-oB,  which  is  an  integral  part  of  most  linear  solvers,  will 
require  special  care  to  preserve  sparsity. 

For  these  reasons  we  will  investigate  solution  techniques  which 
take  advantage  of  the  underlying  quadratic  problem. 

3.  FACTORIZATION  TECHNIQUES.  In  this  section  we  show  that  the 
linear  problem  (2)  can  be  solved  using  well-known  techniques  by 
factoring  an  nxn  matrix  only.  The  lanczos  algorithm  and  subspace 
iteration  appear  to  require  the  factorization  of  the  2n  x  2n  matrix 
A-uB.  However  what  is  actually  needed  is  the  ability  to  multiply 
vectors  by  (A-aB)-iB.  The  special  structure  of  the  A  and  B  matrices 
allows  this  operator  to  be  realized  by  factoring  only  the  nxn  matrix 
W(  a)  =  Ma2+  Ca  +  K. 


Theorem  1 .  Let  A  and  B  be  as  in  equation  (2)  and  let 
W(  a)  =  Ma2  +  Ca  +  K.  Then 

1)  The  number  of  negative  eigenvalues  of  W  equals  the 
number  of  eigenvalues  of  A-AB  between  o  and  0. 

[si:!- ■ My)] 


The  proof  is  given  in  Scott  [5].  Once  the  operator  (A-oB)-1B  has 
been  realized  then  it  is  straight  forward  to  implement  subspace 
iteration  or  the  Lanczos  algorithm  (as  described  in  Scott  [4])  to  find 
the  eigenvalues  of  A-AB  near  a.  The  number  of  negative  eigenvalues  of 
W  can  be  easily  determined  as  a  byproduct  of  the  factorization  and  so 
the  index  of  the  computed  eigenvalues  can  be  found. 

If  many  eigenvalues  are  desired  then  a  sequence  of  shifts  a  can  be 
used.  The  eigenvalue  count  then  gives  the  number  of  eigenvalues  be¬ 
tween  two  consecutive  shifts  so  that  no  eigenvalue  can  be  knowingly 
mi ssed. 
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4.  NONFACTORIZATION  TECHNIQUES.  In  this  section  we  assume  that 
the  factorization  of  M,  C,  K,  or  any  linear  combination  of  them  is 
either  impossible  or  undesirable.  Thus,  we  are  basically  limited  to 
algorithms  similar  to  the  Lanczos  Rayleigh  Quotient  algorithm  presented 
by  Scott  [6]  for  the  linear  pencil  eigenproblem  which  uses  only  matrix- 
vector  multiplications. 

We  have  developed  an  algorithm  based  on  techniques  for  determining 
the  "best"  approximation  to  an  eigenvalue  given  an  approximate  eigen¬ 
vector  and  the  "best"  approximation  to  an  eigenvector  given  an 
approximate  eigenvalue.  The  algorithm  alternates  between  these  approx¬ 
imations  until  convergence,  as  the  following  outline  illustrates: 

I.  Set  the  vector  x0  to  random  numbers. 

II.  For  i  =  1,  2,  ...  until  convergence,  do  a  and  b. 

a.  Determine  "best"  a  from  x 

b.  Determine  "best"  x^  from 

i  i 

Step  II. a  uses  a  general i zation  of  the  Rayleigh  quotient  different 
from  that  of  Lancaster's  [2]  and  specifically  designed  for  the 
quadratic  problem.  Given  any  nonzero  vector  x,  potential  eigenvectors 
of  the  linear  pencil  (A,B)  given  by  (2)  would  be  linear  combinations  of 

the  vectors  [xt  0]*  and  [0,  x*]*.  Using  the  Rayleiqh-Ri tz  procedure, 
the  "best"  approximations  to  eigenvectors  in  this  space  and  corre¬ 
sponding  eigenvalues  can  be  determined.  Best  in  this  context  means 
minimizing  the  Frobenius  norm  of  the  2x2  scaled  residual  matrix  (see 
Parlett  [3]).  The  characteristic  equation  of  the  reduced  linear  pencil 
in  the  Rayleigh-Ritz  procedure  is  equivalent  to  the  quadratic  equation 

(x*Mx)  92  +  (x*Cx)  e  +  (x*Kx)  =0.  (4) 

Thus,  the  approximations  to  two  eigenvalues  of  the  quadratic  A-matrix 
are  given  by  its  roots,  0+(x)  and  G“(x),  which  can  be  easily  determined 
by  the  quadratic  formula.  If  we  are  trying  to  converge  to  a  positive 
(primary)  eigenvalue,  then  the  larger  root  6+(x)  is  chosen  for  ; 

conversely,  the  smaller  root  6“(x)  is  chosen  when  trying  to  converqe  to 
a  negative  (secondary)  eigenvalue.  The  roots  of  (4)  are  identical  to 
the  primary  and  secondary  functionals  discussed  by  Duffin  [1]. 

However,  Duffin  does  not  present  a  theoretical  basis  for  how  and  why 
these  roots  along  with  x  most  closely  approximates  an  eiqenpair  of  the 
quadratic  A-matrix.  A  more  thorough  discussion  of  Rayleigh  quotient 
generalizations  can  be  found  in  Scott  and  Ward  [7]. 

Step  II. b  is  based  on  the  observation  that  if  a  is  an  eigenvalue 
of  the  quadratic  A-matrix  with  x  as  its  eigenvector,  the  matrix  W(  o) 
defined  in  Theorem  1  has  the  eigenpair  (0,  x).  Theorem  1  relates  the 
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eiqenvalues  of  the  symmetric  matrix  W(  a)  to  the  primary  and  secondary 
eigenvalues  of  the  quadratic  A-matrix.  Thus,  to  which  eigenvalue  we 
are  converging  can  be  controlled  by  the  selection  of  the  appropriate 
eigenvector  of  W(o)  to  be  used  in  Step  II. b.  For  example,  the 
following  algorithm  is  used  to  converge  to  the  m  smallest  positive 
eigenvalues: 


I. 


II. 


Set  the  vector  x0  to  random  numbers. 

For  k  =  1,  2,  ...  m,  do  1  and  2 . 

1.  For  i  =1,  2,  ...  until  convergence,  do  a  and  b. 

a.  Set  a,.  =  0+(x^ _|) . 

b.  Set  x.j  =  yk,  where  ( ,  y^)  are  eiqenpairs  of 
W(a1-)  with  ux  <  ^2  <  •••  «  Mn  and  y^  unit-lenqth. 


2.  Set  x0  to  the  yk+^  computed  in  step  l.b  above. 


From  Scott  and  Ward  [7],  we  know  that  the  sequence  {a^}  for  k  =  1 

converges  monotoni cal ly  downward  to  the  smallest  positive  eigenvalue, 
and  the  convergence  is  asymptotically  quadratic.  Also,  the  algorithm 
is  expected  to  quadratical ly  converge  to  the  other  m-1  eiqenvalues,  but 
convergence  is  not  guaranteed. 

A  minor  modification  can  be  made  to  the  algorithm  to  guarantee 
quadratic  convergence  to  interior  primary  or  secondary  eiqenvalues. 

This  modification  requires  the  solution  to  a  2k  x  2k  dense  linear 
pencil  eigenproblem  in  step  II. 1. a.  and  the  computation  of  k  eigen¬ 
vectors  in  step  II. l.b.  The  following  algorithm  is  guaranteed  to 
quadratical ly  convergence  to  the  m  smallest  positive  eiqenvalues: 

I.  Set  the  vector  y:  to  random  numbers. 

II.  For  k  =  1,  2,  ...,  m,  do  1  and  2. 

1.  Set  the  r—  column  of  the  nxk  matrix  X  to  yr 

from  step  I  if  k  =  1  or  from  step  II. 2. b  otherwise. 

2.  For  i  =  1,  2,  ...  until  convergence,  do  a  and  b. 
a.  Set  o.  =  8k  where  0_k  <  0_k+1<  •••  *  e-i  < 

0  <  Oj  <  ...  <  ek  are  the  eiqenvalues  of 
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±r  L_ 

b.  Set  the  r—  column  of  X.  to  y  ,  where  (u. ,  y.) 

*  *  J  J 

are  the  eiqenpairs  of  W(ai)  with  u1  <  u2  <  ...  c 
and  y-  are  unit- length. 

J 

Similar  algorithms  can  be  developed  for  computing  the  m  largest  posi¬ 
tive  eigenvalues  and  the  m  largest  and  smallest  neqative  eigenvalues. 

5.  CONCLUSIONS.  In  this  paper  we  have  presented  several  tech¬ 
niques  for  so  1 vT n g " "symmet r i c ,  definite,  quadratic  A-matrix  problems. 
These  techniques  are  more  efficient,  in  general,  than  applying  linear 
techniques  to  the  equivalent  2n  x  2n  linear  problem.  The  convergence 
rates  of  the  methods  based  on  factoring  W(o)  are  superior  to  the 
convergence  rates  of  the  nonf actorization  methods  presented  in  Section 
4,  and  so  the  factorization  methods  should  always  be  used  if  the 
factorization  is  possible.  If  the  nonfactorization  methods  must  be 
used,  then  it  is  still  possible  to  use  preconditioning  techniques  as  in 
Scott  [6]  to  improve  the  convergence,  if  desired.  Portable  software 
implementing  these  algorithms  should  be  available  in  the  near  future. 
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ABSTRACT 


Very  large  scale  least  squares  problems  arise  in  a  variety  of  applications* 
including  geodetic  network  adjustments,  multiple  regression  analysis,  photogrammetry , 
earthquake  studies,  instrumentation  planning,  and  certain  types  of  finite  element 
analysis-  For  example,  the  adjustment  of  a  geodetic  network  with  6,000,000  observa¬ 
tions  and  400,000  unknowns  is  being  considered.  In  this  paper  a  new  automatic 
ordering  and  partitioning  scheme  for  large  sparse  observation  matrices  is  developed. 
The  method  parallels  somewhat  the  concept  of  block  triangularization  of  square 
unsymmetric  linear  systems.  Comparisons  are  made  with  automatic  ordering  schemes 
based  upon  software  from  the  sparse  matrix  package  SPARSPAK.  These  comparisons 
are  made  by  investigating  the  computational  efficiency  of  solving  the  resulting 
least:  squares  problems  using  orthogonal  decompositions  by  Givens  reduction. 
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I.  INTRODUCTION  AND  OVERVIEW 


1*  Introduction .  Let  A  be  an  m  x  n  sparse  matrix  with  m  >_  n  and  consider 
the  system  of  linear  equations 


Ax  =  b  (1.1) 

where  b  is  a  fixed  vector  of  length  m  .  In  general,  (1.1)  may  not  have  a  solu¬ 
tion  x  ,  In  such  cases,  (1.1)  is  usually  solved  in  the  least  squares  sense;  that 
is,  the  solution  x  is  chosen  to  minimize  the  Euclidean  norm  of  the  residual  vector 

r  -  b  -  Ax  . 

We  shall  assume  throughout  that  A  has  full  column  rank  n  .  Under  this  assumption 
it  is  easy  to  verify  that  the  least  squares  solution  x  is  unique  and  satisfies 
the  normal  equations 

ATAx  =  ATb  .  (1.2) 

T 

Since  the  matrix  A  A  is  symmetric  and  positive  definite,  (1.2)  can  often  be 
solved  efficiently  by  the  Cholesky  algorithm.  Moreover,  in  the  sparse  case,  there 
exists  well-developed  software  for  ordering  the  rows  and  columns  of  A^A  to  reduce 
the  fill-in  during  the  solution  process.  Such  software  is  available,  for  example, 
in  the  sparse  matrix  package  SPARSPAK  (see  George  and  Liu  [1978]).  In  particular, 
an  ordering  is  determined  in  terms  of  a  permutation  matrix  P  so  that  the  Cholesky 
factor  R  for 


PATAPT  (1.3) 

T 

suffers  less  fill-in  than  the  fill-in  for  the  Choleksy  factor  of  A  A  *  Here 

(1.3)  is  factored  into  R^R  ,  where  R  is  upper  triangular,  and  then  x  is  computed 

by  solving  the  two  triangular  systems  RTy  -  PATb  and  RPx  =  y  . 

Unfortunately,  the  normal  equations  method  may  be  numerically  unstable.  This 
is  due  to  the  potential  loss  of  information  in  explicitly  forming  A^A  and  ATb  , 
and  due  to  the  fact  that  the  condition  number  of  A^A  is  the  square  of  that  of  A  . 
In  addition,  ATA  may  no  longer  be  as  sparse  as  the  original  matrix  A  * 


A  well-known  stable  alternative  to  the  computation  of  x  by  solving  the  normal 
equations  (1.2),  is  provided  by  orthogonal  factorization  (Golub  [1965]),  The 
original  matrix  A  is  reduced  by  orthogonal  reduction  to 


R 

y 

QA  = 

9 

Qb  = 

_  0  _ 

z 

where  Q  represents  an  orthogonal  matrix  and  where  R  is  the  Cholesky  factor  of 
A^A  .  The  least  squares  solution  x  to  (1*1)  is  then  obtained  by  solving  the 
triangular  system  Rx  -  y  .  The  matrix  Q  usually  results  from  Gram-Schmidt 
orthogonalization  or  from  a  sequence  of  Householder  or  Givens  transformations.  Both 
the  Gram-Schmidt  and  Householder  algorithms  process  the  unreduced  part  of  A  by 
columns  and  can  cause  severe  intermediate  fill-in.  The  use  of  Givens  rotations  is 
much  more  attractive  in  that  the  matrix  A  is  processed  by  rows,  gradually  building 
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up  R  ,  and  intermediate  fill-in  is  confined  to  the  working  row.  Thus,  as  indicated 
in  Golub  and  Plemmons  [1980],  Givens  rotations  are  generally  preferable  in  the 
sparse  case. 

2.  Sources  of  Sparse  Problems,  Large  scale  sparse  least  squares  problems  of 
ever  increasing  size  have  arisen  in  recent  years.  One  reason  for  this  is  that 
modern  data  acquisition  technology  allows  the  collection  of  massive  amounts  of  data. 
Another  factor  is  the  tendency  of  scientists  to  formulate  more  and  more  complex 

and  comprehensive  models  in  order  to  obtain  finer  resolution  and  more  realistic 
detail  in  describing  physical  systems.  Particular  areas  in  which  such  large-scale 
least  squares  problems  occur  include  geodetic  network  adjustments  (Golub  and 
Plemmons  [1980],  Meissl  [1980]),  photo gramme try  (Golub,  Luk  and  Pagano  [1980]), 
earthquake  studies  (Vanicek,  Elliott  and  Castle  [1979]),  instrumentation  planning 
(Agee,  Turner  and  Meyer  [1976]),  and  in  the  natural  factor  formulation  of  the  finite 
element  problem  (Argyris  and  Brdnlund  [1975],  Argyris  et  al  [1978]),  An  example  of 
truly  spectacular  size  is  the  least  squares  adjustment  of  coordinates  (latitudes  and 
longitudes)  of  stations  comprising  the  North  American  Datum,  which  is  to  be  completed 
in  1983  by  the  U.S.  National  Geodetic  Survey  (Kolata  [1978]).  This  enormous  task 
requires  solving,  perhaps  several  times,  a  least  squares  problem  having  6,000,000 
equations  in  400,000  unknowns. 

3.  A  Sparse  Least  Squares  Algorithm.  In  Golub  and  Flemmons  [1980],  an 
orthogonal  decomposition  procedure  was  suggested  for  solving  large  scale  sparse 
least  squares  problems  such  as  those  that  arise  in  geodetic  adjustment  problems. 

The  method  has  been  further  developed  and  coded  by  George  and  Heath  [1980] ,  where 
some  preliminary  tests  on  geodetic  data  are  reported.  This  algorithm  consists  essen¬ 
tially  of  the  following  steps: 

T 

1,  Determine  the  adjacency  structure  of  the  normal  equations  matrix  A  A  . 

T  T 

2,  Order  the  columns  of  A  by  a  permutation  P  so  that  P  A  AP  has  a  sparse 
Cholesky  factor  R  . 

T  T 

3.  Symbolically  factorize  P  A  AP  ,  generating  a  row-oriented  data  structure 
for  R. 

4.  Compute  R  by  processing  the  rows  of  AP  one-by-one  using  Givens  rotations. 

Notice  that  Step  2  produces  an  indirect  ordering  of  A  ,  by  considering  the 
structure  of  A^A  ,  The  ordering  tested  by  Goerge  and  Heath  [1980]  was  the  quotient 
minimum  degree  algorithm  (see  George  and  Liu  [1981],  Chapter  5). 

4.  Overview.  The  purpose,  of  this  paper  is  to  provide  a  direct  ordering  scheme 
for  the  observation  matrix  A  as  an  alternative  to  Step  2.  A  scheme  based  upon 
permuting  A  to  block  upper  trapezoidal  form  is  given  in  Section  II.  In  Section  III 
some  comparisons  and  tests  results  are  given  with  respect  to  our  block  triangulariza- 
tion  scheme  and  two  indirect  ordering  schemes  from  the  package  SPARSPAK:  the  quotient 
minimum  degree  algorithm  used  by  George  and  Heath  [1980]  and  the  nested  dissection 
algorithm  described  in  George  and  Liu  [1981],  Chapter  8.  The  results  of  these 
tests  are  reported  in  Section  III.  It  is  found  that  our  direct  ordering  scheme  is 
quite  competitive  with  the  indirect  schemes  for  the  examples  we  tested.  This  and 
other  observations  are  summarized  in  Section  IV. 
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II.  A  DIRECT  ORDERING  SCHEME 


1.  Introduction.  We  now  discuss  an  algorithm  for  permuting  a  rectangular 
matrix  A  of  dimension  m  x  n  ,  m  >  n  ,  and  column  rank  n  ,  into  a  block  upper 
trapezoidal  form  given  by 


In  the  above  equation  P  and  Q  represent  permutation  matrices,  while  the  , 

1  <  i  <  k  ,  are  rectangular  matrices  of  dimensions  uu  *  ,  with  ru  ■>  . 

The  square  case,  where  m  =  n  and  m^  =  n^  for  1  <  i  <  k  ,  is  examined  first. 

An  algorithm  developed  by  Duff  [1979]  for  permuting  a  matrix  to  obtain  a  zero-free 
main  diagonal  and  also  an  algorithm  by  Duff  and  Reid  [1978]  for  permuting  a  matrix 
with  a  zero-free  main  diagonal  to  block  upper  triangular  form  are  reviewed. 

The  rectangular  case,  where  m  >  n  and  m^  for  1  <_  i  k  is  then 

examined,  where  the  algorithms  for  the  square  case  are  modified  to  accomodate  a 
rectangular  matrix.  An  algorithm  is  provided  to  select  the  rows  with  fewest  nonzero 
entries  when  obtaining  the  zero-free  diagonal  mentioned  above.  A  heuristic  justifica¬ 
tion  is  given  for  this  algorithm.  Also  an  algorithm  is  presented  which  incorporates 
the  rows  not  included  in  the  intermediate  block  upper  triangular  structure  to  effect 
a  final  block  upper  trapezoidal  structure.  Following  the  presentation  of  this 
algorithm  is  a  brief  discussion  on  uniqueness. 

2.  The  Square  Case.  The  problem  of  permuting  a  square  matrix  A  to  block 
upper  triangular  form  with  square,  indecomposable,  diagonal  blocks  is  reviewed  first. 
Algorithms  for  determining  certain  permutation  matrices  R  and  Q  such  that  Q^RAQ 
is  block  upper  triangular  were  developed  in  Duff  and  Reid  [1978]. 

The  advantages  of  expressing  a  square  sparse  matrix  A  in  a  block  upper 
triangular  form  have  to  do  mainly  with  the  following  facts: 

1.  The  eigenvalues  of  A  are  those  of  the  diagonal  blocks  A-^  and  hence 
those  of  A  can  be  computed  more  easily. 

2.  When  solving  Ax  =  b  with  A  block  upper  triangular  by  Gaussian  Elimination, 
the  elimination  process  can  be  restricted  to  the  diagonal  blocks. 

3.  A  matrix  in  block  upper  triangular  form  can  be  stored  by  blocks.  Depending 
upon  the  sizes  of  the  diagonal  blocks,  storage  of  only  the  nonzero  upper 
blocks  can  reduce  storage  requirements  by  nearly  one-half  in  many  computa¬ 
tional  problems. 
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Advantages  also  exist  for  permuting  rectangular  matrices  to  block  upper  trapezoidal 
form.  These  advantages  involve  least  squares  computations. 


The  following  definitions  will  facilitate  the  remaining  discussion,  A  matrix 
A  is  said  to  be  decomposable  if  there  exist  permutation  matrices  P  and  Q  such 
that 


PAQ  - 


A 


11 

0 


12 


where  A^^  and  A?  2  are  square.  Otherwise  A  is  indecomposable .  A  matrix  A 
said  to  be  reducible  if  there  exists  a  permutation  matrix  Q  such  that 


is 


T 

Q  AQ  = 


hi  A12 
0  A22 


where  A^  and  A22  are  square.  Otherwise  A  is  irreducible.  Thus  a  decomposable 
matrix  can  be  asymmetrically  permuted  to  block  upper  triangular  form,  while  a 
reducible  matrix  can  be  symmetrically  permuted  to  such  a  form.  Indecomposability 
implies  irreducibility ,  but  the  converse  of  this  statement  is  not  true. 


A  transversal  of  a  matrix  is  a  set  of  nonzero  elements  of  the  matrix,  no  two 
of  which  are  on  the  same  row  or  column.  The  length  of  a  transversal  refers  to  the 
number  of  elements  included  in  it.  A  maximal  transversal  of  a  matrix  is  a  transversal 
of  greatest  length.  Of  course  a  matrix  can  have  more  than  one  maximal  transversal. 

A  matrix  A  with  full  column  rank  n  must  have  at  least  one  transversal  of  length 
n  ,  This  fact  can  be  shown  in  the  following  way.  Suppose  A  has  full  column  rank 
n  and  has  a  maximal  transversal  of  length  k  <  n  .  Then  there  exist  n  -  k  columns 
of  A  that  do  not  include  a  transversal  element.  If  the  rows  and  columns  of  A  are 
now  permuted  by  P  and  Q  so  that 


PAQ  = 


11 


21 


12 


A 


22 


where  A^j_  is  a  k  x  k  matrix  with  main  diagonal  consisting  of  the  k  transversal 

A12 


elements,  then  the  n  -  k  columns  of 


LA22 


do  not  contain  any  transversal  elements. 


It  follows  from  the  assumption  that  the  maximal  transversal  has  length  k  ,  that  all 
entries  of  the  (n  -  k)  x  (n  -  k)  matrix  A22  are  zero.  Now  form  the  set  V  of  the 

"Alll 

There  exists  a  subset  S  of  V  consisting  of  columns 
'21  1  TAi2 


k  columns  of 


which  contain  the  transversal  element  in  a  row  in  which  one  of  the  columns  of 


A 


22  J 


has  a  nonzero  entry.  The  columns  in  S  contain  no  nonzeroes  below  their 
entry;  otherwise  the  assumption  that  the  maximum  transversal  is  of  length  k  is 
violated.  The  columns  of  S  are  linearly  independent  and  form  a  basis  for  a  vector 


space  of  which  the  columns  of 


RU 

LA22 


are  vectors.  Therefore  the  columns  of  PAQ 
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are  not  linearly  independent  and  A  cannot  have  column  rank  n  .  A  contradiction 
has  been  reached.  Thus  every  matrix  A  of  full  column  rank  n  must  have  a  trans¬ 
versal  of  length  n  .  It  should  be  noted  that  this  transversal  is  generally  not 
unique,  and  also  that  a  matrix  of  column  rank  less  than  n  may  still  possess  a 
transversal  of  length  n  .  Once  a  transversal  is  selected,  row  permutations  are 
sufficient  to  bring  the  transversal  elements  to  the  main  diagonal. 

A  simple  proof  of  the  following  well-known  fact  can  be  found  in  Duff  and  Reid 
[1978]  and  in  George  and  Gustavson  [1980]. 


Fact.  If  a  square  matrix  A  has  a  zero-free  main  diagonal,  then  A  is 
irreducible  if  and  only  if  it  is  indecomposable. 

Thus  given  that  a  square  matrix  in  block  upper  triangular  form  has  a  zero-free 
main  diagonal,  irreducible  diagonal  blocks  are  also  indecomposable.  The  usefulness 
of  this  fact  will  become  apparent  momentarily. 

The  Duff-Reid  algorithms  for  transversal  selection  and  block  triangularization 
of  square  matrices  are  briefly  summarized  below.  A  more  complete  description  can  be 
found  in  Duff  [1978]  and  Duff  and  Reid  [1978], 


The  Duff  approach  to  the  selection  of  a  transversal  consists  of  two  phases. 

In  the  cheap  assignment  phase,  each  row  is  examined  and  within  that  row  the  locations 
of  the  nonzeroes  are  determined.  As  soon  as  a  nonzero  is  found  in  a  column  with  no 
transversal  element  assigned  to  it,  that  nonzero  is  included  in  the  transversal, 
and  the  next  row  is  examined.  After  completion  of  the  cheap  assignment  phase,  the 
transversal  is  generally  not  of  maximum  length.  The  task  of  selecting  a  maximal 
transversal  falls  to  the  second  phase  of  the  algorithm,  the  depth  first  search.  In 
the  depth  first  search  phase,  each  row  1q  not  already  containing  a  transversal 
element  is  examined  and  a  graph  theoretical  reassignment  chain  is  constructed  as 
follows.  The  chain  of  row  indices 


implies  that  element  a. 


■1^1 


is  currently  a  transversal  element  and  aiQj^  nonzer°; 


l2^2 


is  currently  a  transversal  element  and  a 


is  nonzero;  etc..  If  such  a  chain 

+  ....  which  is  in  a  column  Jn  not 

xkJ  0  u 

currently  containing  a  transversal  element,  a  reassignment  is  performed.  This  means 


can  be  formed  such  that  there  exists  a  nonzero 


1lJ2 
a-; 


that  the  elements  {a-j 


lb’  ai2l2’ 


a^kjk^  belonging  to  the  previous  transversal 


are  deleted  from  the  transversal  and  the  elements 


{aiob’  aiiV 


aikj0}  replace 


them,  thereby  increasing  the  length  of  the  transversal  by  one.  For  a  matrix  of  full 
column  rank  n  ,  sufficient  repetitions  of  this  scheme  will  produce  a  transversal  of 
length  n  .  Finally  a  row  permutation  is  performed,  bringing  the  transversal  entries 
to  the  main  diagonal. 


The  Duff  and  Reid  [1978]  implementation  of  the  Tarjati  algorithm  for  permuting 
a  matrix  to  block  lower  triangular  form  is  easily  adapted  to  the  task  of  permuting  a 
matrix  to  block  upper  triangular  form.  The  algorithm  is  best  applied  to  matrices 
already  permuted  so  that  the  main  diagonal  is  zero-free.  For  such  matrices,  a 
symmetric  (rather  than  an  asymmetric)  permutation  will  be  sufficient  to  put  the  matrix 
in  block  triangular  form  with  indecomposable  diagonal  blocks.  To  construct  the 
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permutation  matrix,  a  directed  graph  of  the  matrix  A  is  formed  with  one  vertex  of 
the  digraph  corresponding  to  each  row  of  the  matrix.  Edges  of  the  directed  graph 

are  defined  in  the  following  way.  The  edge  Gr~*0  means  a-j_j  ^  0  ,  so  that  a 

one-to-one  correspondence  exists  between  off-diagonal  nonzeroes  of  the  matrix  and 
edges  of  the  graph,  A  set  of  vertices  any  one  of  which  may  be  reached  from  any 
other  in  the  set  by  travelling  along  a  set  of  edges  is  said  to  be  a  connected 
component.  If  no  other  vertex  may  be  added  to  a  component  without  destroying  its 
connectedness ,  the  set  is  said  to  be  a  maximal  connected  component.  Every  directed 
graph  has  a  unique  set  of  maximal  connected  components  and  once  these  are  found  for 
the  directed  graph  under  consideration,  a  symmetric  permutation  is  formed  to  relabel 
the  vertices  such  that  all  vertices  within  a  maximal  connected  component  are  labelled 
consecutively  and  such  that  if  there  exists  an  edge  from  a  vertex  in  component  i  to 
a  vertex  in  component  j  ,  then  all  vertices  in  component  i  are  labelled  before 
all  those  in  component  j  . 

Although  the  final  block  upper  triangular  form  is  not  unique,  it  is  unique  up 
to  symmetric  permutations  within  diagonal  blocks  and  the  order  of  the  diagonal 
blocks  along  the  diagonal.  Another  important  fact  is  that  the  final  form  is 
independent  of  which  maximum  transversal  is  selected.  This  is  because  if  rows  (or 
columns)  i  and  j  can  be  interchanged  in  RA  and  a  zero-free  diagonal  maintained, 
then  vertices  i  and  j  will  be  in  the  same  connected  component  and  all  four 
elements  a^-j_  ,  a-j_j  ,  a j  i  ?  ai  j  will  end  up  in  one  of  the  diagonal  blocks  regard¬ 
less  of  which  transversal  is  utilized.  The  scheme  discussed  here  for  permuting  a 
square  matrix  to  block  upper  triangular  form  is  modified  next  to  produce  a  scheme  for 
permuting  rectangular  matrices  to  block  upper  trapezoidal  form. 

3.  The  Rectangular  Case.  We  now  present  a  new  algorithm  for  permuting  a 
matrix  A  of  dimension  m  x  n  ,  m  >  n  with  full  column  rank,  to  block  upper  trape¬ 
zoidal  form.  The  algorithm  consists  of  four  main  steps.  The  first  and  third  steps 
are  raodif ications  of  the  two  steps  of  the  algorithm  described  earlier  for  permuting 
square  matrices  to  block  upper  triangular  form.  The  four  steps  are  as  follows: 

1.  A  transversal  of  length  n  is  chosen  and  a  permutation  matrix  R  formed 
so  that 


where  A^  is  a  square  n  *  n  matrix  with  a  zero-free  main  diagonal. 

2.  In  general,  matrix  A  can  have  more  than  one  transversal  of  length  n  . 

This  implies  that  the  matrix  Aj  chosen  above  is  not  unique*  In  this  step 
the  rows  of  RA  are  permuted  so  that  Aj_  is  replaced  by  A^  which  contains 
as  few  nonzeroes  as  possible  and  still  maintains  a  zero-free  main  diagonal. 

At  this  point  in  the  algorithm 

R' (RA)  =  R' 

where  A|  consists  of  the  n  rows  of  the  original  matrix  A  which  contain 
the  fewest  number  of  nonzeroes  and  can  still  be  permuted  to  yield  a  zero- 
free  diagonal. 
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3,  In  this  step  a  permutation  matrix  Q  is  formed  to  symmetrically  permute 
the  square  matrix  A^  to  block  upper  triangular  form, 


V  0 

R'RAQ  = 

rA*l 

Ai 

0  I 

A* 

_  m-n . 

L  2J 

where  A*  is  block  upper  triangular  with  indecomposable  diagonal  blocks. 

4.  Finally,  a  block  upper  trapezoidal  form  is  achieved  by  forming  a  permutation 
matrix  P  such  that 


where  each  A^  is  a  rectangular  block  of  dimension  x  n^  ,  with  _>  n^ 

Each  step  is  now  examined  in  more  detail. 


Selecting  an  A 


Selection  of  a  transversal  for  A  is  achieved  by  slightly  modifying  the  Harwell 
subroutine  MC21A  (Duff  [1978])  so  that  all  m  rows  may  be  scanned  during  the  cheap 
assignment  and  depth  first  search  phases.  Selecting  a  transversal  of  length  n  is 
generally  faster  for  an  m  *  n  matrix  of  rank  n  with  m  >  n  than  for  an  n  x  n 
matrix  of  rank  n  ;  for,  since  there  are  more  rows  to  scan  during  the  cheap  assignment 
phase,  one  would  expect  a  greater  number  of  transversal  elements  to  be  found  during 
cheap  assignment  in  the  rectangular  matrix,  thereby  reducing  the  number  which  must 
be  found  by  the  more  expensive  depth  first  search.  Output  of  the  modified  MC21A 


subroutine  is  an  m-vector  defining  a  row  permutation  of 
tion  matrix  R  is,  of  course,  never  formed. 


to 


rAii 
-  a2- 


The  permuta- 


Selecting  an  A^ 

The  matrix  Aj_  ,  as  noted  earlier,  is  a  member  with  fewest  nonzeroes  of  the  set 
of  all  possible  n  x  n  matrices  with  zero-free  diagonal  which  may  be  formed  by 
choosing  n  rows  of  A  .  A|  is  not  unique,  a  fact  that  will  be  discussed  later. 

The  selection  of  is  motivated  by  heuristic  considerations.  If  it  is 

agreed  that  a  final  block  upper  trapezoidal  structure  with  more  zeroes  below  the 
diagonal  blocks  is  more  desirable  than  an  alternate  block  upper  trapezoidal  structure 
with  fewer  such  zeroes,  then  it  is  often  better  to  have  narrower  (and  thus  more) 
diagonal  blocks.  Since  the  width  of  the  diagonal  blocks  is  completely  determined  by 
the  sizes  of  the  square  diagonal  blocks  in  A*  at  the  end  of  step  three,  it  follows 
that  one  would  like  to  promote,  if  possible,  smallness  of  these  blocks.  This  may  be 
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done,  in  a  heuristic  sense,  by  choosing  a  matrix  Aj_  for  block  upper  triangulariza- 
tion  in  step  three  for  which  the  associated  graph  has  few  edges  and  therefore  tends 
to  have  more,  smaller,  connected  components  rather  than  a  few  large  ones.  The  size 
of  the  connected  components  determines  the  size  of  the  diagonal  blocks.  Using  an 
arbitrary  A^  rather  than  A^  can  have  a  drastic  effect  on  the  block  structure  of 
the  final  form. 


In  the  example  below,  the  rows  of  RA  in  (2.1),  are  permuted  to  yield  R'RA 
in  (2.2).  The  effect  of  this  permutation  is  to  replace  an  submatrix  with  the 

A^  submatrix. 


1110 
0  111 


RA  = 


1 

0 


0  10 
0  0  1 


110  0 


0  10  0 


(2.1) 


1 


10  0 


R'RA 


0  .1  0  0 

10  10 
0  0  0  1 


1110 

0  111, 


(2.2) 


Note  that  the  Aj_  submatrix  in  (2.1)  is  in  block  upper  triangular  form,  with 
one  3  x  3  diagonal  block  and  one  1  x  1  diagonal  block.  The  Aj_  submatrix  in 
(2.2)  is  not  yet  in  indecomposable  block  upper  triangular  form.  This  form  is  shown 
below.  Note  the  four  1  x  l  diagonal  blocks. 


110  0 


v 

0 


0 

I 


m-n 


R'RAQ 


0  110 
0  0  10 
0  0  0  1 


1110 
10  11 


(2.3) 
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The  following  algorithm  finds  an  using  a  graph  theoretical  technique.  In  the 

discussion  below,  a  path  is  sought  in  a  directed  graph  with  m  vertices,  each 
corresponding  to  one  of  the  m  rows  of  the.  matrix  RA  ,  and  where  the  edge 


corresponds  to  a  nonzero  entry 


Ji 


Note  that  the  existence  of  an  edge 


from  i  to  j  means  that  if  row  i  is  currently  in  A-^  ,  row  j  could  replace 
row  i  and  A^  will  still  have  a  zero-free  main  diagonal.  If  row  j  had  previously 
been  in  A2  and  had  fewer  nonzeroes  than  row  i  ,  then  exchanging  row  i  and  row 
j  would  have  the  desired  effect  of  reducing  the  number  of  nonzeroes  in  while 

preserving  the  zero-free  main  diagonal. 


In  general,  a  path  corresponding  to  a  desirable  row  permutation  is  of  the  form 

where  all  vertices  except  i^  correspond  to  rows  in  Aj_  and  row  has  fewer 

nonzeroes  than  row  i-^  .  The  cyclic  row  permutation  defined  by  this  path  would 
replace  each  row  by  the  row  after  it  on  the  path  and  would  replace  row  i^.  by  row 
il  ,  This  would  result  in  a  net  exchange  of  one  row  between  Aj_  and  A2  .  The 
permutation  corresponding  to  each  such  path  is  performed  as  the  path  is  found. 


The  path  can  end  without  defining  a  desirable  permutation.  This  can  happen 
either  of  two  ways.  The  path  may  reach  a  vertex  corresponding  to  a  row  in  Aj_  from 
which  there  are  no  departing  edges,  or  the  path  may  reach  a  vertex  corresponding  to 
a  row  in  A2  which  does  not  contain  fewer  nonzeroes  than  the  row  at  the  beginning 
of  the  path.  If  either  event  occurs,  the  last  edge  on  the  path  is  removed  and  a 
replacement  sought.  This  process  is  called  backtracking. 

The  following  path  could  be  constructed  using  the  matrix  in  (2,1): 

The  path  ends  at  5  since  5  >  4  =  n  ,  and  the  fifth  row  contains  fewer  nonzeroes 
than  does  the  first  row.  Performing  the  reassignment  indicated  by  this  chain 
involves  replacing  row  1  by  row  3,  row  3  by  row  2,  row  2  by  row  5,  and  row  3  by  row 
1  .  The  resulting  matrix  is  shown  below. 


10  10 
110  0 


R’RA 


0  111 
0  0  0  1 


1110 


0  1  0  0J 


(2.4) 


Performing  the  permutation  results  in  a  net  gain  of  one  zero  for  .  At  each 

stage  in  the  construction  of  a  path,  the  algorithm  will  attempt  to  add  a  vertex 
corresponding  to  a  row  in  A2  before  adding  one  corresponding  to  a  row  in  A]_  , 
thus  tending  to  find  the  shortest  and  simplest  permissible  path.  Note  that  Step  2 
is  not  yet  complete,  for  rows  2  and  6  should  be  interchanged  to  minimize  the  number 
of  nonzeroes  in  R'RA  . 
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Block  Triangularizing 

The  square  submatrix  Ap  can  be  permuted  to  upper  block  triangular  form  by  the 
modified  Harwell,  subroutine  MC13D  (Duff  and  Reid  [1978])  discussed ? earlier .  No 
special  modifications  need  be  made  for  the  rectangular  case  since  Ap  is  square. 

The  user  must ? remember ,  however,  not  only  to  symmetrically  permute  the  rows  and 
columns  of  Ap  ,  but  also  the  columns  of  A2  as  well. 


Row  Permutation  to  Final  Form 


After  the  matrix  has  been  permuted  so  that  A*  is  block  upper  triangular  with 
square  indecomposable  diagonal  blocks,  the  rows  of  the  lower  submatrix  A2  must  be 
permuted  into  the  Ap  matrix  in  such  a  manner  as  to  obtain  the  block  upper  trape¬ 
zoidal  form.  This  will  make  some  of  the  diagonal  blocks  rectangular  and  will  remove 
all  nonzeroes  from  beneath  the  diagonal  blocks.  Each  row  in  A^  is  examined  to 
determine  the  column  index  of  its  first  nonzero.  The  row  is  then  inserted  into 
Ap  just  above  the  row  containing  the  transversal  element  in  that  column.  If 
anotherrow  in  A|  has  its  first  nonzero  in  the  same  column,  this  row  is  inserted 
into  Aj  just  above  the  row  previously  inserted.  For  example,  this  method  would 
permute  the  matrix  in  (2,3)  in  the  following  manner. 


0 

1 

1 

1 

1 

1 

0 

1 

1 

0 

0 

0 

1 

1 

0 

0 

0 

1  1 

0 

0 

0 

0  , 

I 

1 

(2,5) 


The  final  block  upper  trapezoidal  structure  of  a  given  matrix  is  of  course  not 
unique.  In  general  there  are  several  possible  maximal  transversals  yielding  several 
different  Ap  submatrices,  each  having  the  minimum  number  of  nonzeroes  possible. 

Lack  of  uniqueness  of  the  final  block  structure  is  also  due  to  the  fact  that  there 
may  be  more  than  one  way  in  which  the  diagonal  blocks  may  be  permuted  among  themselves 
and  still  preserve  the  block  upper  trapezoidal  form.  Some  of  these  permutations 
are  more  desirable  than  others,  as  they  result  in  a  "dual  angular"  form  as  described 
in  Go lub  an d  P 1 emmon s  [1980], 


III,  COMPARISONS  AND  TEST  RESULTS 


1.  Test  Problems.  The  relative  performance  of  four  methods  for  ordering  a 
large  sparse  system  of  linear  equations  prior  to  solution  of  the  system  using  Givens 
rotations  was  compared.  The  four  ordering  options  used  were:  1)  no  ordering  at 
all,  2)  quotient  minimum  degree  ordering,  3)  nested  dissection,  and  4)  block  trape¬ 
zoidal  ordering.  These  four  ordering  options  were  compared  on  four  sparse  systems 
of  equations,  of  sizes  75  x  50  ,  100  x  75  ,  150  x  100  ,  and  892  x  261  .  The 
first:  three  examples  were  constructed  by  the  authors  with  known  solutions  in  order 
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to  chock  the  accuracy  of  the  programs.  The  fourth  example  consisted  of  actual 
geodetic  network  data  obtained  by  the  U.S,  National  Geodetic  Survey,  All  programs 
were  written  in  IBM  H  Extended  Fortran  and  run  on  an  IBM  370/3031  computer. 

2.  Software  Used.  The  basic  software  package  used  in  this  project  was  a  double 
precision  version  of  SPARSPAK,  a  sparse  matrix  package  of  subroutines  written  and 
documented  by  George  and  Liu  [1978,  1981].  This  package  was  extended  by  George 

and  Heath  [1980]  to  provide  for  solving  a  sparse  least  squares  problem  using  Givens 
rotations  in  conjunction  with  quotient  minimum  degree  ordering.  Harwell  subroutines 
MC21A  and  MCI 3D,  developed  by  Duff  [1978]  and  Duff  and  Reid  [1978],  were  adapted 
and  used  by  Litsey  [1980]  in  his  implementation  of  the  block  trapezoidal  ordering 
method.  In  this  project,  the  authors  combined  and  modified  all  of  the  above  soft¬ 
ware  in  order  to  test  and  compare  the  four  ordering  options  in  conjunction  with  the 
Givens  method  for  solving  least  squares  problems  by  orthogonal  decomposition. 

The  "no  ordering1’  option  was  tested  in  order  to  provide  a  benchmark  against 
which  to  compare  the  effectiveness  of  the  other  three  ordering  methods.  The  quotient 
minimum  degree  ordering  scheme  attempts  to  minimize  fill-in  by  reordering  an 
original  n  x  n  matrix  A  in  the  following  way:  at  each  stage,  if  columns 
1,  .  ,  ,  ,  k  have  been  selected  already  for  the  reordered  matrix,  then  the  column 

in  the  remaining  (n  -  k)  x  (n  -  k)  submatrix  with  the  fewest  number  of  nonzeroes 
is  selected  as  the  (k+l)st  column  in  the  reordered  matrix.  The  nested  dissection 
ordering  method  attempts  to  permute  the  matrix  A  in  such  a  way  that  it  can  be 
broken  down  recursively  into  subblocks  which  are  connected  in  a  well-defined  way. 

As  a  result  of  this  dissection  process,  zero  blocks  are  formed  with  remain  zero 
after  the  reordered  matrix  is  factored,  thus  reducing  the  fill-in. 

3.  Format  of  Data.  Data  for  sparse  matrices  is  entered  according  to  one  of 

two  schemes.  In  the  first  scheme,  each  non-zero  entry  of  the  matrix  is  entered  as 
a  triple  consisting  of  its  row  index,  column  index,  and  value.  Triples  may  be 
entered  in  any  order,  subject  to  the  condition  that  all  entries  of  a  given  row  must 
be  entered  together  as  a  group.  The  matrix  is  then  stored  in  four  arrays:  arrays  ICN 
and  VALUE  contain  column  indices  and  corresponding  values  of  all  non-zero  elements  in 
the  matrix;  array  element  IP(l)  points  to  the  position  in  ICN  where  column 
indices  for  the  row  begin,  and  array  element  LENR(I)  gives  the  number  of  non¬ 

zero  entries  in  the  Ith  row.  The  Harwell-block  trapezoidal  ordering  code  requires 
this  storage  scheme,  SPARSPAK  routines,  as  adapted  for  Givens  rotations,  utilize 

a  different  scheme  for  storing  a  sparse  matrix.  They  accept  one  row  of  the  matrix 
at  a  time,  according  to  the  format  NSUBS,  (SUBS(K)),  VALUE  (K) ,  K  -  1,  NSUBS)  where 
NSUBS  is  the  number  of  non-zero  entries  in  the  row,  SUBS (K)  is  the  column  index  of 
the  non-zero  entry  in  the  row,  and  VALUE (K)  is  the  corresponding  value  of  that 
entry. 

In  addition  to  providing  the  sparse  coefficient  matrix  according  to  the  correct 
format,  one  must  input  for  each  equation  its  right-hand  side  and  a  possible  weighting 
factor  for  the  equation  (in  the  context  of  least  squares). 

4.  Organization  of  Computer  Programs.  The  basic  program  used  in  this  project 
consists  of  two  job  steps,  as  indicated  in  Figure  1.  In  job  step  1,  data  for  a 
sparse  system  of  equations  is  read  according  to  its  original  format  and  converted 
to  an  appropriate  format  for  the  next  stage  (Step  A) .  For  ordering  options  other 
than  block  trapezoidal  ordering,  the  data  is  stored  on  a  disk  according  to  the 
SPARSPAK  row-by-row  format  and  the  data  is  passed  to  job  step  2  (Step  C) .  For  the 
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block  trapezoidal  ordering*  the  original  data  is  converted  to  the  Harwell  scheme, 
ordering  is  performed  (Step  B) ,  and  the  resulting  matrix  is  then  converted  to 
SPARSFAK  format  on  the  disk  before  being  passed  to  step  2. 


Figure  1.  Schematic  View  of  Program 

Job  step  2  is  executed  in  a  SPARSPAK  environment  under  control  of  a  main  driver 
routine.  After  the  zero-nonzero  structure  of  the  coefficient  matrix  is  read  in 
(Step  0),  one  of  three  ordering  options  is  selected:  the  system  is  ordered  by 
quotient  minimum  degree  ordering  (Step  E) ,  nested  dissection  (Step  F),  or  no  ordering 
is  performed  (Step  G)  .  After  rewinding  the  reordered  data  set  on  the  disk,  numerical 
values  for  the  system  are  read  one  row  at  a  time,  each  row  is  reduced  by  Givens 
rotations  to  form  the  Cholesky  factor  R  (Step  H) ,  and  the  least  squares  solution 
to  the  system  is  computed  (Step  I).  Finally,  statistics  provided  by  SPARSPAK  are 
printed  out  (Step  J) . 

The  four  sequences  of  execution  steps  listed  below  correspond  to  the  four 
ordering  options  named: 

no  ordering 

quotient  minimum  degree 
nested  dissection 
block  trapezoidal 

5.  Basis  for  Comparison  of  Methods.  The  SPARSPAK  package  provides  statistics 
which  estimate  the  storage  and  execution  time  required  for  solving  a  given  system 
of  linear  equations.  Execution  times  (in  seconds)  are  reported  for  each  of  four 
individual  steps:  ordering  (Finding  permutation  matrix  P  to  obtain  PAPT) ,  storage 
allocation  (set  up  data  structure  for  storing  the  Cholesky  factor  R  of  PAPT  ), 
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factorization  (numerically  factor  PAP  into  R  R  ),  and  triangular  solution  (solve 
Rxy  =  Pb  ,  Rz  =  y  ,  and  set  x  =  PTz  ).  Storage  requirements  are  given  for  the 
ordering,  allocation,  and  factorization  solution  steps.  Also,  operation  counts 
are  given  for  the  factorization  and  triangular  solution  steps;  actual  values  are 
manipulated  in  these  two  steps.  An  operation  is  defined  to  be  a  multiplication  or 
division  since  most  arithmetic  operations  in  matrix  computations  occur  as  a  multiply- 
add  pair.  Final  fill-in  is  said  to  occur  whenever  the  Cholesky  factor  R  has  a 
non-zero  element  in  a  position  which  contains  a  zero  element  in  the  upper  triangular 


portion  of  the  matrix  PAPJ 


The  amount  of  intermediate  fill-in  along  with  the 


final  fill-in  are  reflected  in  the  number  of  operations  for  factorization. 


6.  Test  Results.  Table  I  contains  selected  results  for  the  four  systems  of 
equations  Ax  —  b  which  were  tested.  The  size  of  sparse  matrix  A  is  given  at 
the  top  of  each  column . 


IV.  OBSERVATIONS 


1*  Test  results  reported  in  Table  1  indicate  that  the  block  trapezoidal  ordering 
scheme  performs  quite  well  in  reducing  both  intermediate  and  final  fill-in  to  the 
observation  matrix  during  the  orthogonal  decomposition.  For  these  four  test  problems, 
the  minimum  degree  algorithm  performed  slightly  better  than  the  other  two  ordering 
schemes.  For  the  geodetic  network  problem,  all  three  ordering  methods  resulted  in  a 
ten-fold  or  greater  reduction  in  factorization  time,  operations  during  factorization, 
and  final  fill-in. 

2.  With  regard  to  ordering  time,  the  block  trapezoidal  algorithm  performed 
considerably  worse  than  the  other  schemes.  We  feel  that  this  is  due  in  large  part 
to  the  fact  that  the  software  for  the  minimum  degree  and  nested  dissection  algorithms 
is  well-developed  and  optimized  whereas  the  software  for  the  block  trapezoidal 
algorithm  is  still  in  a  rough  state;  further  work  in  developing  this  software  is 
underway. 
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TABLE  I 


Test  Results 


75  x  50 


No_ 

ordering 

1- 

Fill-in  (approximate) 

182 

2. 

Operations  for 
f  actorization 

100,314 

3. 

Time  for  allocation 

.326 

4. 

lime  for  f ac torization 
and  solution 

1.084 

Minimum  Degree 

i. 

Fill-in  (approximate) 

0 

2. 

Operations  for 
factorization 

46,162 

3. 

Time  for  ordering 
and  allocation 

.565 

4. 

Time  for  factorization 
and  solution 

.676 

Nested  Dissection 

i. 

Fill-in  (approximate) 

64 

2. 

Operations  for 
factorization 

65,063 

3. 

Time  for  ordering 
and  allocation 

.359 

4. 

Time  for  factorization 
and  solution 

.324 

Block  Trapezoidal 


1. 

Fill- in  (approximate) 

134 

2. 

Operations  for 
factorization 

60,032 

3. 

Time  for  ordering 
and  allocation 

1.853 

4. 

Time  for  factorization 
and  solution 

.776 

100  x  75 

150  x  100 

892  x  261 

209 

106,186 

893 

387,445 

7884 

37,211,056 

.415 

.532 

1.301 

1.223 

3.274 

246.728 

0 

49,288 

254 

185,588 

0 

3,128,844 

.  669 

1.112 

6.593 

.778 

1.930 

27.187 

66 

63,597 

512 

287,323 

429 

3,056,212 

.491 

.608 

1.459 

.881 

2.680 

26.303 

185 

79,446 

512 

284,555 

226 

4,386,519 

4.33 

3.150 

219.830 

.975 

2.651 

37.720 
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TRANSVERSE  CURRENTS  AND  OHMIC  LOSSES  OF 
MICROSTRIP  OBTAINED  FROM  A  MATRIX  FORMULATION 
WHICH  FACILITATES  THEIR  NUMERICAL  CALCULATION 


Peter  J*  McConnell  and  Robert  L*  Brooke 
US  Army  Mobility  Equipment  R&D  Command 
Fort  Belvoir,  Virginia  22060 


ABSTRACT:  A  matrix  formulation  is  developed  for  calculating  the  transverse 

current  distribution  of  microwave  transmission  lines  and  applied  to  solving  for 
frequency  dependent  impedance  ahd  loss  of  microstrip  commonly  used  for  antenna 
feedlines  and  in  microwave  construction.  The  solution  is  based  on  calculating 
the  effective  inductance  per  unit  length,  and  is  unique  in  providing  the  frequency 
dependence  and  ohmic  losses  for  any  geometry.  Results  for  specific  geonetries 
are  compared  favorably  with  earlier  capacitance  based  solutions.  The  frequency 
dependance  of  current  distribution  and  characteristic  impedance  will  be  shown  for 
two  commonly  employed  geometries. 

1.  INTRODUCTION.  The  characteristic  impedance  of  Microstrip  Transmission 
Lines  has  been  of  interest  for  twenty-five  years*  An  excellent  compilation  of 
selected  papers  is  contained  in  reference  (1)  including  several  early  papers 
specifically  addressing  Microstrip.  The  papers  by  Cohn  (2),  Wheeler  (3),  and 
Bryant  and  Weiss  (A)  are  especially  fundamental  and  useful  for  common  engineering 
problems.  These  papers,  and  a  multitude  of  others  published  since,  depend  on 
solving  for  the  static  capacitance  per  unit  length  of  the  selected  line  configuration* 
Most  practical  geometries  do  not  lend  themselves  to  an  exact  analytic  solution 

so  much  effort  has  been  devoted  to  developing  approximate  analytic  solutions. 

With  the  advent  of  the  high  speed  computer  a  considerable  effort  has  been  devoted 
to  developing  numerical  techniques  for  solving  Laplace’s  equation  to  yield  the 
electric  field  configuration  and  capacitance  of  useful  geometries.  Reference 
(5)  is  a  notable  example  of  this  approach. 

The  approach  used  in  this  paper,  while  nonanalytic  is  quite  general  and 
without  any  geometric  limitation  In  the  transverse  plane.  This  approach  is 
unique  in  that  the  effects  of  finite  conductor  losses  and  frequency  dependence 
can  be  included  in  the  analysis.  Assumptions  made  by  other  authors  are  also 
made  here.  The  lines  to  be  analyzed  are  assumed  to  be  relatively  low  loss  lines 
supporting  Quasi^TEM  modes.  Capacitance  based  solutions  are  static  solutions 
which  approach  exactness  only  for  lossless  lines.  The  inductance  based  solution 
to  be  developed  and  applied  here  can  allow  for  losses  but  is  quasi-static  and 
retarded  potentials  have  not  been  considered. 

2,  THEORETICAL  DEVELOPMENT.  If  the  line  has  finite  losses,  the  characteristic 
impedance  and  progagation  constant,  y,  can  be  calculated  from 


£o  gT ivL 

V  G  +  jwC 


y  te  a  +  j  g 


\/(R  +  jvL)  (G  T  jwC) 


Where  R  and  G  are  the  series  resistance  and  shunt  conductance  per  unit  length  of 
the  line  and  a  and  &  are  the  attenuation  and  phase  constant  of  the  line.  With 
the  appropriate  modifications  to  account  for  normal  low  loss  transmission  line 
and  other  reasonable  assumptions  given  in  reference  (7),  an  alternate  equation 
for  characteristic  impedance  results  in  the  form 

Ho  -  vL 

where  L  is  the  inductance  per  unit  length  and  is  directly  affected  by  both  the 
resistance  of  the  line  and  frequency. 

The  geometry  to  be  solved  is  shown  in  Figure  1* 


h-w*H 


Figure  1 

Transverse  Geometry 


For  generality  the  two  conducting  tapes  are  allowed  to  have  differing  widths, 
however  they  will  be  maintained  parallel  and  with  their  mid-points  defining  a 
plane  normal  to  both.  This  is  a  simplif ication,  and  is  not  required,  but  will 
shorten  computation  time  substantially.  This  configuration  can  be  used  to 
represent  microstrip  examples  given  in  references  (3)  and  (4),  and  direct  com¬ 
parisons  of  results  made  for  limiting  cases  of  high  frequency  and  no  loss.  In 
addition,  this  arrangement  allows  for  calculating  the  parameters  of  antenna  feed¬ 
lines  where  the  widths  are  the  same  as  well  as  microwave  components  where  the 
ground  plane  has  a  known  finite  width. 

Mathematically  subdividing  the  conductors  into  smaller  parallel  sections 
is  accomplished  as  shown  in  Figure  2.  This  method  of  subdivision  is  arbitrary 
and  is  retained  for  consistancy  with  reference  (7). 
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Figure  2 

Method  of  Indexing  Subdivisions 


The  two  conducting  tapes  have  now  been  replaced  by  An  thin  parallel  tapes,  each 
of  which  may  carry  a  different  current*  An  equivalent  circuit  of  the  trans¬ 
mission  line  then  looks  like  that  in  Figure  3* 


h  Mie  \\ 


Figure  3 

Equivalent  Circuit 
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The  width  of  each  subsection  will  be  chosen  sufficiently  small  to  consider 
the  current  density  in  each  to  be  uniform*  DC  inductance  equations  are  available 
to  calculate  the  mutual  inductance  between  any  two  subsections  and  the  self 
inductance  of  each.  The  resistance  of  each  section  will  be  the  dc  resistance 
calculated  from  the  input  parameters  of  bulk  resistivity  and  incremental  area. 

The  incremental  area  is  defined  by  the  smaller  of  the  actual  tape  thickness  or 
an  arbitrary  multiple  (an  input  parameter)  of  the  skin  depth,  and  the  subsection 
width.  In  this  paper,  the  upper  and  lower  tapes  are  each  divided  into  2n  equal 
width  subsections.  References  (7)  and  (8)  discuss  alternative  methods  used  to 
speed  convergence  for  different  geometries. 

The  current  in  each  element  can  be  expressed  in  terms  of  an  arbitrary  applied 
voltage,  resistive  voltage  drops,  and  induced  voltages  due  to  all  current  elements. 
This  leads  to  a  set  of  linear  equations  which  can  be  numerically  solved  for  the 
current  in  each  subsection.  The  matrix  algebra  is  tedious  and  will  not  be  given 
here*  The  effective  inductance  per  unit  length  can  then  be  found  as; 


Leff  = 


2v  1  Sk 


(Iak)2  +  (Xbk): 


and  the  effective  resistance  as; 


Reff  = 


2  1  \ 


(Iak)2  +  (Ebk)2 


where  and  b^  are  the  in  phase  and  quadrature  components  of  current  in  each 
substitution. 

3.  RESULTS  &  COMPARISON.  Two  cases  were  calculated  and  compared  with 
results  produced  by  other  authors.  An  equal  width  case  was  compared  directly 
with  the  results  of  reference  (3)  and  an  unequal  width  case  with  the  results  of 
reference  (4)  providing  the  larger  tape  is  at  least  ten  times  the  width  of  the 
smaller,  as  the  latter  reference  assumes  an  infinite  ground  plane.  Only  the  high 
frequency  or  lossless  case  will  be  considered  since  this  is  also  an  assumption  of 
the  references.  The  comparative  values  obtained  from  the  references  required 
interpolating  published  response  curves.  The  excellen  agreement  is  more  than 
would  have  been  expected.  The  slightly  high  bias  of  the  results  in  Table  2  are 
probably  the  result  of  the  finite  dimension  of  the  larger  tape.  The  program 
calculations  have  been  rounded  to  the  nearest  ohm. 

The  current  distributions  produced  in  solving  for  the  results  of  Table  1 
are  shown  superimposed  in  Figure  6  for  values  of  w/h  from  one  to  thirty.  It  is 
clear  that  widening  the  two  conductors  results  in  a  more  uniform  distribution  of 
current  in  the  transverse  plane  and  better  shielding.  This  of  course  is  what 
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one  should  expect,  and  1b  used  as  the  basis  in  reference  (3)  for  a  wide  tape 
approximation,  and  to  establish  a  limit  for  the  effective  dielectric  constant 
of  K  dielectric  and  propagation  velocity  of  vc/y^diel  * 


TABLE  1 

COMPARISON  OF  RESULTS  FOR  EQUAL  WIDTH  (K=l) 


w/h 

W.  HEILER  (REF  3) 

THIS  METHOD 

0.3 

315 

313 

0.4 

279 

280 

0.5 

252 

254 

0.6 

232 

234 

0.7 

216 

217 

0.8 

202 

202 

0.9 

189 

190 

1.0 

178 

179 

3.0 

87 

87 

10.0 

32 

32.5 

TABLE  2 

COMPARISON  OF  RESULTS  FOR  UNEQUAL  WIDTH  (K=l) 


Wj  /  h 

Wj  /  w,  *=  10 

BRYANT  &  WEISS 

(REFERENCE  4) 

THIS  METHOD 

0.6 

156 

160 

0.8 

140- 

142 

1.0 

127 

128 

2.0 

87 

90 

3.0 

67 

70 
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TRANSVERSE  CUT 


EQUAL  TAPE  CURRENT  DISTRIBUTION 
AS  A  FUNCTION  OF  WIDTH/SEPARATION 


A.  CONCLUSION.  In  this  report  we  have  shown  a  method  to  calculate  the 
inductance  per  unit  length  and  ohmic  losses  of  microstrip  line  with  general 
cross-sectional  geometry,  A  program,  developed  to  apply  this  technique,  was 
exercised  for  simple  cases  and  the  results  found  to  agree  with  those  of  other 
authors  using  capacitance  based  solutions*  This  approach  is  unique  in  that  it 
directly  provides  the  transverse  current  distribution  and  ohmic  attenuation  at 
all  frequencies*  It  can  provide  new  insights  into  factors  which  cause  loss  in 
microwave  components  and  help  explain  the  effective  behavior  of  currents  on 
extended  antenna  structures.  A  clear  understanding  of  antenna  feedlines  and 
radiating  elements  can  only  be  reached  if  the  current  distribution  is  known. 

For  a  more,  complete  development  of  this  approach  and  a  detailed  examination 
of  frequency  dependence  of  current,  impedance,  and  loss,  the  reader  is  referred 
to  reference  (11). 
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ASPECTS  OF  ALGEBRAIC  COMPUTATION 


B,  F,  Caviness1 

General  Electric  Research  and  Development  Center 
Schenectady,  NY  12345 

ABSTRACT  In  this  brief  paper  we  give  some  examples  of  the  current  state  of 
algebraic  computation  plus  some  references  for  further  reading  on  applications 
and  the  design  of  algebraic  algorithms.  The  appendix  contains  a  short  direc¬ 
tory  of  computer  algebra  systems. 


1,  INTRODUCTION  Algebraic  (or  symbolic)  computation  is  a  type  of  scientific 
computing  in  which  computations  are  carried  out  with  algebraic  and  other  sym¬ 
bols  in  addition  to  numeric  entities.  Also,  typically,  the  computations  are 
carried  out  exactly,  unlike  most  numerical  calculations  where  computations  are 
carried  out  using  approximate  arithmetic.  In  this  short  paper  we  will  give 
some  examples  of  the  capabilities  of  current  computer  algebra  systems,  note 
some  applications,  and  suggest  what  the  future  holds. 


2.  EXAMPLES  OF  CURRENT  CAPABILITIES  Some  examples  will  help  to  clarify  what 
is  meant  by  algebraic  computation,  to  distinguish  it  from  the  more  commonly 
known  numeric  computation,  and  to  indicate  the  scope  and  abilities  of  current 
systems.  It  is  possible  using  algebraic  systems  to  calculate  indefinite 
integrals  such  as  Isin  x  dx  and  obtain  the  result  -cos  x.  It  is  also  possible 
to  calculate  some  definite  integrals  exactly.  For  example,  using  Macsyma  (see 
the  end  of  the  demonstration  given  below)  [MATH77]  one  can  ask  to  calculate 

J  .? dx  and  receive  the  exact  answer,  7t,  This  integral  can,  of  course,  also 

— ©  t 

be  calculated  using  numerical  techniques,  but  one  then  obtains  an  approximate 

answer. 

To  further  indicate  the  scope  and  abilities  of  modern  algebraic  computa¬ 
tion  systems,  we  give  below  a  copy  of  a  session  using  Macsyma*  The  lines 
labelled  (Cl),  (C2),  are  inputs  typed  by  the  user.  The  corresponding 

output  lines  produced  by  the  computer  are  labelled  (Dl) ,  (D2) ,  The  CPU 

time  required  for  each  computation  is  given  just  before  each  computer 
response.  The  times,  given  in  milliseconds,  are  for  a  PDP-KL10  at  MIT.  Lines 
enclosed  in  /*  ***  */  brackets  are  comments  typed  by  the  user, 

(Cl)  /*  Read  pretyped  file  of  commands  */ 
demo (bf c , demol) ; 

(C2)  /*  Display  CPU  time  */ 
showtime: true$ 

Time=  2  msec. 

(C3)  /*  Type  and  display  function  */ 


1,  Present  address:  Department  of  Computer  and  Information  Sciences, 
University  of  Delaware,  Newark,  Delaware  19711 
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X/(X**3+1); 
Time=  9  msec. 
<D3) 


X 


3 

X  +  1 


(C4)  /*  Add  previous  function  to  itself.  %  refers  to  the  immediately 
preceding  expression,  D3  in  this  case.  */ 

Time*  3  msec. 


(D4) 


2  X 

3 


X  +  1 


(C5)  (X+3)**20; 
Time=  8  msec. 


20 

(D5)  (X  +  3) 

(C6)  /*  Expand  the  previous  expression  */ 

RATS IMP (%) ; 

Time=  100  msec. 

20  19  18  17  16  15 

(D6)  X  +  60  X  +  1710  X  +  30780  X  +  392445  X  +  3767472  X 

14  13  12  11 

+  28256040  X  +  169536240  X  +  826489170  X  +  3305956680  X 

10  9  8 


7 


+  10909657044  X  +  29753610120  X  +  66945622770  X  +  123591918960  X 

6  5  4  3 

+  185387878440  X  +  222465454128  X  +  208561363245  X  +  147219785820  X 

2 

+  73609892910  X  +  23245229340  X  +  3486784401 
/*  Note  the  large  integers  in  the  above  expression  that  occur  without 
any  truncation  */ 

(Cl)  /*  Differentiate  the  previous  expression  */ 

DIFF(%,X); 
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Timo^  198  msec, 

19  18  17  16  15  14 

(D7)  20  X  +  1140  X  +  30780  X  +  523260  X  +  6279120  X  +  56512080  X 

13  12  11  10 

+  395584560  X  +  2203971120  X  +  9917870040  X  +  36365523480  X 

9  8  7  6 

+  109096570440  X  +  267782491080  X  +  535564982160  X  +  865143432720  X 

5  4  3  2 

+  1112327270640  X  +  1112327270640  X  +  834245452980  X  +  441659357460  X 

+  147219785820  X  +  23245229340 


(C8)  /*  Now  factor  it  */ 

FACTOR(%) ; 

Time=  1186  msec. 

19 

(D8)  20  (X  +  3) 

(C9)  /*  This  shows  a  numerical  capability  of  the  system.  %e  is  the 
constant  e.  */ 

%e**x**3; 

Time=  10  msec. 

3 

X 

(D9)  %E 

(CIO)  ROMBERGtev^)  ,X,1,2)  j 

/*  This  computation  requires  some  programs  to  be  loaded  from  disk  */ 

ROMBRG  FASL  DSK  MACSYM  being  loaded 
Loading  done 

NUMER  FASL  DSK  MACSYM  being  loaded 
Loading  done 
Time=  1165  msec, 

(D10)  275.51098 

(Cll)  /*  Macsyma  has  several  routines  for  manipulating 

series  of  various  kinds.  The  following  is  a  Taylor  series.  */ 

TAYLOR(SIN(X) ,X,0,9) j 

HAYAT  FASL  DSK  MACSYM  being  loaded 
Loading  done 
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Time=  71  msec. 


(Dll)/T/ 


3  5  7  9 

XXX  X 

6  120  5040  362880 


(C12)  /*  Taylor  can  also  compute  Laurent  expansions  */ 

TAYLOR(1/(COS(X)-SEC(X))**3,X,0,5); 

EULBRN  FASL  DSK  MAXOUT  being  loaded 
Loading  done 
Time-  233  msec. 


1 

11  347 

6767  X 

15377  X 

4 

2  X 

2  15120 

120  X 

604800 

7983360 

(C13)  /*  Macsyma  can  solve  some  systems  of  non-linear  equations.  Here 
we  compute  exactly  the  six  roots  of  unity.  */ 

SOLVE (X* *6-1) ; 

SOLVE  FASL  DSK  MACSYM  being  loaded 
Loading  done 
Time=  1295  msec. 

SORT ( 3 )  %I  +  1  S0RT(3)  %I  -  1  SQRT<3)  %I  +  1 

(D13)  [X - ,  X  - - ,  X  =  -  1,  X  = - , 

2  2  2 

SQRT<3)  %I  -  1 

2 

(C14)  /*  Now  solve  system  of  equations  */ 

SOLVE ([A*X+B*Y  -  0,C*X+D*Y  =  1J,[X,Y]); 

Time=  163  msec. 

B  A 

(D14)  [[X  - - .  Y  - - ]] 

BC-AD  BC-AD 

(C15)  /*  Now  define  a  matrix  */ 


MATRIX ( [A,B,C] , [D,E,F] , [G,H,I] ) ; 
Time=  7  msec. 


(D15) 


[ABC] 

[  ] 

[  D  E  F  ] 

[  ] 

I  G  H  I  ] 
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(C15)  /*  and  take  its  transpose  */ 

TRANSPOSE (%) ; 

Time=  6  msec. 

t  A  D  G  ] 

[  ] 

(D16)  [  B  E  H  ] 

[  1 

[  C  F  I  ] 


(C17)  /*  Now  compute  the  matrix  product  of  it  and  its  transpose. 

%TH(2)  refers  to  the  second  previous  expression,  D1S  in  this 
case.  */ 

%  .  %TH(2) ; 

MOOT  FASL  DSK  MACSYM  being  loaded 
Loading  done 
Time-  130  msec. 

[222 

.[  G  +  D  +  A  GH  +  DE  +  AB 

[ 

(D17)  [  222 

[GH  +  DE  +  AB  H  +  E  +  B 

[ 

[ 

[GI  +  DF  +  AC  HI  +  EF  +  BC 

(CIS)  /*  Create  a  Vandermonde  matrix  */ 

MATRIX([X**2,X,1] , [Y**2,Y,1] , [Z**2,Z,1] ) ; 

Time-  10  msec. 

[  2 
E  X  X 
[ 

(D18)  [  2 

[  Y  Y 

l 

[  2 

[  Z  Z 

(C19)  /"'  Now  compute  its  determinant  */ 

DETERMINANT ( % ) ; 

Time=  39  msec. 

2  2  2  2  2 

(D19)  -YZ  -  X  (Y  -  Z  )  +  Y  Z  +  X  (Y-Z) 

(C20)  /*  Factor  the  determinant  */ 

Factor(%) ; 


] 

1  ] 
] 
] 

1  ] 
] 
] 

1  ] 


] 

GI  +  DF  +  AC] 
] 
] 

HI  +  EF  +  BC] 
] 

2  2  2  ] 
I  +  F  +  C  ] 
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Time=  750  msec 
(D20) 


-  (Y  -  X)  (Z  -  X)  (Z  -  Y) 


(C21)  /*  Symbols  can  be  given  mathematical  properties  */ 

DECLARED,  ODD)# 

Time=  12  msec. 

(C22)  /*  and  then  these  are  used  in  evaluating  subsequent  expressions  */ 

C0S(N*SbPI/2) ; 

Time=  5b  msec. 

(D22)  0 

(C23)  F(X+Y); 

Time=  4  msec. 

(D23)  F(Y  +  X) 

(C24)  /*  Another  such  example  */ 

DECLARE ( F , LINEAR ) $ 

Time=  3  msec. 

( C25)  F(X+Y); 

Time=  9  msec. 

(D25)  F(Y)  +  F(X) 

(C26)  /*  A  dramatic  example  of  "infinite-precision"  arithmetic  */ 

100!  ; 

Time=  34  msec. 

(D26)  93326215443944152681699238856266700490715968264381621468592963895217599# 
99322991560894146397615651828625369792082722375825118521091686400000000000000# 
0000000000 

(C27)  /*  Macsyma  can  also  be  used  as  a  programming  language.  The  following 
code  defines  the  factorial  function.  */ 

FAC(N) : =IF  N  -  0  THEN  1  ELSE  N*FAC(N-1); 

Time=  5  msec. 

(D27)  FAC(N)  :=  IF  N  -  0  THEN  1  ELSE  N  FAC(N  -  1) 

(C28)  /*  Now  use  our  newly  defined  function.  Compare  the  execution  time 
to  the  built-in  factorial  function  used  in  C26.  */ 

FAC(5) ; 

Time=  54  msec. 

(D28)  120 

(C29)  /*  There  are  also  facilities  for  large  floating-point  precision. 

The  following  instruction  sets  the  floating-point  precision 
to  50  decimal  places.  */ 
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FPPREC:50$ 

FLOAT  FASL  DSK  MACSYH  being  loaded 
Loading  done 
Time=  41  msec. 

(C30)  /*  Now  print  pi  with  this  precision.  */ 

BFLOATCfcPI); 

Time=  53  msec. 

(D30)  3.1415926535897932384626433832795O28841971693993751B0 

(C32)  demo(bfc,demo2) ; 

(C33)  SHOWTIME :TRUE$ 

Time=  3  msec. 


(C34)  X/(X**3-1); 
Time=  10  msec. 

(D34) 


X 


3 

X  -  1 


(C35)  /*  Compute  the  indefinite  integral  of  the  previous  expression.  */ 
INTEGRATE (%,X) ; 

SIN  FASL  DSK  MACSYM  being  loaded 
Loading  done 

SININT  FASL  DSK  MACSYM  being  loaded 
Loading  done 

SCHATC  FASL  DSK  MACSYM  being  loaded 
Loading  done 
Time-  419  msec. 

2  X  +  1 

2  ATAN( - ) 

LOG(X  +  X  +  1)  SQRT(3)  LOG(X  -  1) 

(D35)  - + - + - 

6  SORT ( 3 )  3 


(C36)  /*  Differentiate  the  result. 
DIFF(%,X>; 

Time=  67  msec. 


(D36) 


2 


2 

(2  X  +  1) 

3  ( - +  1) 

3 


*/ 


2  X  +  1 


2 

6  (X  +  X  +  1) 


(C37)  f*  Macsyma  does  not  automatically  simplify  its  results  so  we 
must  tell  it  to  do  so.  •/ 
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RATSIMP(%) ; 
Time=  79  msec . 

(D37) 


X 

3 


X  -  1 

(C38)  X*SIN(X)44feE**X**2+l/L0G(X) ; 

Time=  28  nsec. 

1 

(D38)  X  SIN(X)  + - 

LOG(X) 


2 


X 

+  %E 


(C39)  INTEGRATE (%,X) ; 

RISCH  FASL  DSK  MACSYM  being  loaded 
Loading  done 

PFRAC  FASL  DSK  MAXOUT  being  loaded 
Loading  done 

ERF  FASL  DSK  MAXOUT  being  loaded 
Loading  done 

RPART  FASL  DSK  MACSYM  being  loaded 
Loading  done 
Time=  2488  msec. 

/ 

[  1  SQRTOfcPI)  %I  ERF (II  X) 

(D39)  I - dX - +  SIN(X)  -  X  COS(X) 

]  LOG(X)  2 

/ 

f*  In  the  above  example  Macsyma  was  unable  to  integrate  the  first  term 
in  which  case  it  simply  inserts  an  integral  sign  in  front  of  the 
the  integrand.  ERF  denotes  the  error  function.  */ 

/*  To  gain  more  space  it  was  necessary  to  restart  Macsyma  for  the  next  demo. 
Thus  the  command  numbers  recycle.  */ 

(C9)  /*  Assign  an  expression  to  the  variable  FI..*/ 

F!:SIN(X)/X; 

Time^  14  msec. 

SIN(X) 


(CIO)  /*  Now  we  demonstrate  a  simple  graphics  capability  of  Macsyma  on  a 
character  display.  It  would  look  nicer  on  a  graphics  display, 
but  the  important  point  here  is  not  the  actual  display  on  a 
character  terminal  but  the  fact  that  such  capabilities  are 
integrated  into  the  system  in  a  natural  way.  */ 

PLOINUM : PL0TNUM1 :50$ 
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Time*  3  msec. 


(Cll)  PL0T2(F1,X,-12,12) j 

APL0T2  FASL  DSK  SHARE  being  loaded 
Loading  done 

TEKPLT  FASL  DSK  SHARE  being  loaded 
Loading  done 

FFORMA  FASL  DSK  LIBLSP  being  loaded 
Loading  done 

PRINT  FASL  DSK  SHARE  being  loaded 
Loading  done 

..  BFC  10:51:05  Monday,  1st  June,  1981 
•  •  *  • 

.  1.  . 


*f«.  *  *  •  •  •  ■ 

-0.3. 

Xinin  ■  —12  Xmax  -  12  Ymin  =  —0.3  Ymax  =  1 


Time=  1909  msec. 

(Dll)  DONE 

(C12)  INTEGRATE (FI, X, -INF, INF) ; 

Time*  7507  msec. 

(D12)  %PI 


The  above  examples  display  only  a  few  of  the  facilities  available  in  sys¬ 
tems  such  as  Macsyma  and  Reduce  [HEAA71] .  There  have  been  dozens  of  computer 
algebra  systems  developed  in  the  past  fifteen  years.  For  a  directory  of  some 
of  the  best— known  and  most  widely  available  in  the  0.  S.  see  the  appendix. 
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Computer  algebra  systems  have  been  used  in  hundreds  of  applications 
including  pure  mathematics,  celestial  mechanics,  general  relativity,  high 
energy  physics,  NMR  imaging,  economics,  acoustics,  computer-aided  design, 
design  of  VLSI  circuits,  fluid  mechanics,  fracture  mechanics,  helicopter  blade 
motion,  ship  hull  design,  underwater  shock  waves  plus  many  others*  For  infor¬ 
mation  on  various  applications  see  [PRF81,  LEWV79,  JENR76,  NASA77,  NGEW79,  and 
WANP81] . 

The  success  of  today's  systems  has  been  made  possible  by  important 
improvements  during  the  last  decade  in  many  fundamental  computational  algo¬ 
rithms  plus  the  discovery  of  algorithms  for  some  problems  such  as  indefinite 
integration  where  no  algorithm  was  previously  known.  Important  progress  has 
been  made  on  gcd  computations,  factoring,  resultant  computations,  simplifica¬ 
tion,  integration,  and  solution  of  ordinary  differential  equations  in  closed 
form.  Most  of  the  important  papers  on  algorithmic  advances  can  be  found  in 
the  proceedings  of  various  ACM  SIGSAM  symposia  [WANP81,  NGEW79,  JENR76, 

PETS71] .  Some  notable  exceptions  are  papers  on  integration  by  Risch  [RISR69, 
RISR70] ,  the  book  by  Davenport  [DAVJ81],  Singer's  paper  [SINM81]  on  solving 
nth  order  homogeneous  ordinary  differential  equations  in  closed  form,  Gosper's 
paper  [GOSR78]  on  summation  of  series,  and  the  works  of  Musser  [MUSD75] ,  Wang 
[WAR075,  WANP78],  Yun  [MOYU73,  YUND74] ,  Zassenhaus  [ZASH69] ,  and  Zippel 
[ZIPR79]  on  polynomial  factorization.  The  paper  by  Yun  and  Stoutemyer 
[YUST80]  gives  a  good  survey  of  many  aspects  of  algebraic  computation. 

In  the  future  we  will  see  continued  progress  on  new  algorithms,  continued 
progress  on  system  development,  and  the  appearance  of  powerful  scientific 
workstations  using  the  personal  computers  currently  appearing  on  the  market 
with  integrated  numeric,  algebraic  and  graphics  software. 


3*  CONCLUSION  Dramatic  advances  are  being  made  in  scientific  computation 
today.  By  the  year  2000,  or  perhaps  sooner,  the  scientific  computation  world 
of  the  average  scientist  or  engineer  will  be  significantly  changed.  Essen¬ 
tially  all  the  known  mathematical  computational  methods  used  with  pencil  and 
paper  today  will  be  programmed  into  personal  workstations,  putting  the  best 
and  latest  techniques  at  the  fingertips  of  each  technical  worker  thereby  giv¬ 
ing  him  or  her  the  ability  to  routinely  solve  problems  that  they  were  previ¬ 
ously  unable  to  do  because  of  a  lack  of  personal  knowledge,  computing  power, 
or  both.  Previously  solvable  problems  will  be  doable  in  a  fraction  of  the 
scientist's  time  required  today  thereby  tremendously  increasing  the  produc¬ 
tivity  of  all  technical  researchers. 

Indeed  many  of  these  promises  are  here  today.  Some  laboratories  have 
already  made  algebra  systems  available  to  their  employees.  The  Navy  has  set 
up  a  network  for  the  use  of  Macsyma.  Other  organizations  are  planning  to  make 
algebra  systems  available  or  are  expanding  current  facilities  while  others  are 
just  beginning  to  realize  their  great  potential.  This  author  believes  that 
nothing  short  of  a  revolution -in  scientific  computation  is  underway! 
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APPENDIX 

A  Directory  of  Some  Computer  Algebra  Systems 

For  a  more  complete  directory  see  the  article  "Symbolic  Mathematical  Com¬ 
putation"  by  David  Yun  and  David  Stoutemyer  [YUST80] . 


ALDES/SAC-2.  A  highly  portable,  batch  system  with  a  growing  list  of 
facilities.  Detailed  and  accurate  documentation.  Available  from 
Prof,  G.  E,  Collins,  1210  W,  Dayton  St,,  Department  of  Computer 
Sciences,  Dniv.  of  Wisconsin,  Madison,  WI  53706. 

ALTRAN,  A  highly  portable,  batch  system  restricted  primarily  to 
rational  function  and  truncated  power  series  computations. 

Excellent  documentation  and  error  diagnostics.  Available  from  the 
Computing  Information  Library,  Bell  Laboratories,  600  Mountain  Ave,, 
Murray  Hill,  NJ  07974. 

MACSYMA,  The  most  extensive  of  all  the  computer  algebra  systems.  Runs 

interactively  under  ITS  on  a  PDP-10,  under  MULTICS  on  a  Honeywell,  and 
under  Berkeley  UNIX  on  a  VAX,  Also  available  via  the  ARPA  net.  For 
information  contact  Prof.  Joel  Moses  or  V,  Ellen  Golden,  MIT  Laboratory 
for  Computer  Science,  545  Technology  Sq.,  Cambridge,  MA  02139, 

muMATH.  A  microcomputer  algebra  system  intended  for  educational  and 

personal  use.  Commercially  available  from  The  Soft  Warehouse,  P.O, 

Box  11174,  Honolulu,  Hawaii  96828, 

REDUCE*  A  portable,  interactive  system  with  many  facilities.  Has  been 
used  for  many  applications,  mostly  in  physics.  Documentation  weak. 
Available  from  Dr,  A.  C.  Hearn,  Rand  Corp, ,  1700  Main  Street,  Santa 
Monica,  CA  90406. 

SCRATCHPAD,  An  interactive  system  under  development  at  IBM  Research. 

Has  many  facilities.  For  information  contact  Dr.  R,  D.  Jenks  or 
Dr.  D,  Y,  Y.  Yun,  IBM  T.  J,  Watson  Research  Center,  P.  0,  Box  218, 
Yorktown  Heights,  NY  10598. 
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NUMERICAL  SOFTWARE  FOR  FIXED  POINT  MICROPROCESSOR  APPLICATIONS 
AND  FOR  FAST  IMPLEMENTATION  OF  MULTIGRID  TECHNIQUES 


Steve  McCormick 
Department  of  Mathematics 
Colorado  State  University 
Fort  Collins,  Colorado  80523 


ABSTRACT.  The  first  part  of  this  paper  will  describe  the  considerations  that 
must  be  made  for  the  development  of  numerical  software  routines  for  limited  environ¬ 
ment  microcomputer  evaluation  of  elementary  functions.  Though  the  presentation  has 
broader  implications,  it  is  assumed  that  the  target  microcomputer  is  a  single  chip, 
binary,  fixed  point,  truncated  arithmetic,  and  short  wordlength  device.  The  applica¬ 
tion  is  assumed  to  demand  a  real-time,  dedicated,  special  purpose  device.  The  main 
feature  of  this  part  of  the  paper  is  guidelines  recommended  for  software  development 
in  such  an  environment. 

The  second  part  of  the  paper  will  describe  a  very  simplified  viewpoint  of 
multigrid  methods  as  single  grid  directional  minimization  algorithms  for  variationally 
posed  problems.  This  viewpoint  leads  to  very  simple,  broad  convergence  theory,  but 
the  purpose  of  this  talk  is  to  describe  how  it  can  be  exploited  to  develop  test  code 
for  multigrid  application.  More  specifically,  this  viewpoint  suggests  a  means  for 
modifying  existing  relaxation  routines  in  order  to  produce  a  multigrid  simulator. 

Such  modifications  involve  only  the  relaxation  process,  can  be  implemented  in  a 
surprisingly  small  amount  of  code,  do  not  increase  storage  requirements  nor  impact 
the  data  structure,  and  eliminate  the  need  to  determine  the  fine-to-coarse  grid 
transfer  operator  and  coarse  grid  equations.  Though  somewhat  less  efficient  than 
the  usual  multigrid  code,  it  offers  a  very  quick  way  of  applying  multigrid  to  perhaps 
a  very  large  and  complex  existing  software  package.  Included  in  the  talk  is  a  descrip-, 
tion  of  a  routine  for  solving  general  two-dimensional  elliptic  boundary  value  problems 
on  a  rectangle.  It  was  implemented  in  BASIC  on  a  Hewlett  Packard  9845  in  about  sixty 
lines  of  code. 

1.  NUMERICAL  SOFTWARE  FOR  FIXED  POINT  MICROPROCESSOR  APPLICATIONS.  The  first 
part  of  this  paper  concerns  the  task  of  implementing  numerical  software  in  a  very 
limited  microprocessor  environment.  The  focus  is  on  guidelines  for  development  of 
software  for  elementary  function  evaluation.  These  guidelines  have  evolved  during 
a  research  project  initially  supported  by  the  National  Science  Foundation  and  later 
by  the  U.  S.  Army  Research  Office.  It  is  the  culmination  of  the  effort  headed  by 
G.  Taylor,  M.  Andrews,  and  the  author.  Since  a  detailed  report  [1]  and  several 
research  papers  [2]-[6]  were  published  containing  the  details  of  this  subject,  we 
merely  paraphrase  the  introduction  of  [1]  for  our  purposes  here. 

The  report  focus  on  numerical  methods  for  limited  environment  microprocessor 
implementation:  the  target  micro-processor  is  assumed  to  be  a  single  chip ,  binary , 
fixed  pointy  truncated  arithmetic ,  short  wordlength  (16-bit  or  less)  processor;  and 
the  application  is  assumed  to  demand  a  real-time,  dedicated,  special-purpose  device 
(as  opposed  to  an  application-detached  general  purpose  computer  system)  and  requires 
near  full  machine  accuracy. 

The  main  objective  of  the  report  is  to  present  guidelines  for  the  development 
of  software  routines  for  evaluation  of  elementary  functions.  There  is  essentially 
no  reference  to  sources  for  acquiring  existing  software  because  such  sources  are 
apparently  nonexistent,  although  some  sources  seem  to  be  on  the  horizon. 
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Narrowing  the  subject  to  elementary  functions  was  essential.  Although  brief 
comments  are  made  that  relate  to  the  implementation  of  other  numerical  tasks  such 
as  Fast  Fourier  Transforms,  explicit  computational  problems  (e.g.,  transforms, 
matrix  multiplies,  and  polynomial  evaluations)  represent  too  broad  of  an  area  to 
treat  in  such  a  report.  Moreover,  except  for  function  evaluation,  there  is  generally 
too  little  known  about  solving  implicit  problems  (e.g.,  inverse  transforms,  matrix 
equations,  and  differential  equations)  in  short  wordlength  fixed  point  arithmetic. 

Even  with  the  focus  limited  to  elementary  functions,  there  are  certain  diffi¬ 
culties.  First,  existing  and  future  microprocessors  and  applications  are  markedly 
diverse  in  nature.  Tradeoffs  for  accuracy,  speed,  system  resource  usage,  and 
reliability  are  complex  and  must  be  considered  carefully  for  each  development  under 
design.  Second,  there  is  much  controversy  surrounding  predictions  of  the  future  of 
microelectronics  which  complicates  the  task  of  presenting  guidelines  for  design. 
Third,  there  is  a  great  variety  of  algorithms  and  forms  of  algorithm  implementations 
available.  General  recommendations  are  therefore  difficult  to  make.  Fourth,  there 
are  many  alternatives  to  machine  language  or  microprogrammed  implementation  of 
numerical  algorithms  including  table  look-up,  existing  numerical  processor  chips, 
and  special  chip  design  with  hardwired  computation. 

These  difficulties  dictate  two  philosophical  features  evident  in  the  report. 
First,  the  sample  algorithms  and  implementations  are  not  the  best  choice  for  every 
environment,  but  should  prove  suitable  for  most  in  the  limited  setting  defined 
above.  General  comments  and  suggestions  are  made  where  appropriate  so  that  one  can 
view  the  sample  algorithms  as  illustrations  of  the  general  concepts.  The  second 
feature  is  that  the  suggestions  represent  what  can  be  done  by  implementing  numerical 
function  evaluations  in  software.  The  report  makes  only  brief  reference  to  the 
trade-offs  with  respect  to  the  other  alternatives.  Thus,  the  comments  should  be 
viewed  as  tools  for  system  design  that  can  be  considered  along  with  other  alterna¬ 
tives  in  light  of  the  specific  application. 

Though  the  choice  of  software  implemented  function  evaluation  is  left  to  the 
decision  of  the  reader,  there  are  some  apparent  recommendations  made  in  the  report. 
Perhaps  the  future  will  involve  numerical  algorithms  implemented  in  customized 
hardwired  chip  designs,  in  chips  programmed  during  the  last  stages  of  fabrication, 
for  example.  For  such  implementations,  the  guidelines  in  the  report  would,  in  fact, 
be  relevant  for  designs  based  upon  these  alternatives.  But  for  the  present,  the 
main  competitors  to  software  implementation  are  table  look-up  and  floating  point 
chips.  The  sample  algorithms  presented  in  the  report  can  be  implemented  in  very 
compact  mode  (50  or  60  words  are  typical),  which  may  in  some  cases  be  implemented 
on  the  processor  chip  itself,  and  executed  at  a  speed  equivalent  to  at  most  a  few 
fixed  point  multiplies/divides.  (Note  that  multiply/divide  may  be  hardwired  or 
softwired,  depending  on  the  host  processor.)  This  is  significantly  faster  than 
existing  floating  point  chips,  although  the  speed  gap  will  narrow  dramatically  with 
the  introduction  of  faster  floating  point  processors.  Of  course,  the  longer  word- 
length  floating  point  chips  provide  greater  potential  accuracy.  Nevertheless, 
software  implementation  may  be  somewhat  cheaper,  although  program  development  costs 
must  be  accounted  for,  and  should  require  less  hardware  complexity.  On  the  other 
hand,  table  look-up  is  certainly  faster  than  either  alternative  and  attractive  when 
memory  demands  permit.  In  a  loose  sense,  then,  software  implementation  dominates 
16-bit  applications  while  the  table  look-up  and  floating  point  chip  alternatives 
are  more  competitive  in  shorter  and  longer  wordlength  applications,  respectively. 

The  trade-offs  between  table  look-up  and  software  implementation  for  8-  and  16-bit 
microprocessors  is  exemplified  by  the  square  root  function  treated  in  [2]. 
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2.  NUMERICAL  SOFTWARE  FOR  FAST  IMPLEMENTATION  OF  MULTIGRID  TECHNIQUES. 

There  are  several  algebraic  interpretations  of  multigrid  methods  for  general  matrix 
problems.  (See  [7]  for  example  and  the  references  cited  therein.)  For  symmetric 
variationally  posed  problems,  a  very  useful  algebraic  point  of  view  is  developed  by 
considering  the  coarse  grid  computations  as  they  effect  the  fine  grid  approximation. 

In  fact,  this  viewpoint  can  be  exploited  [8]  to  achieve  a  theory  including  rigorous 
and  sharp  rates  of  convergence  under  very  general  conditions.  However,  the  purpose 
of  this  second  part  of  the  paper  is  to  describe  how  this  viewpoint  provides  for  a  very 
fast  and  simple  method  of  implementing  multigrid  in  software.  More  precisely,  this 
point  of  view  defines  a  method  that  is  theoretically  equivalent  to  multigrid.  Though 
computationally  less  appealing,  it  can  be  implemented  with  minimal  design  effort  and 
in  very  short  code,  and  does  not  involve  much  impact  on  an  existing  software  package 
that  is  being  modified  for  multigrid  application.  We  begin  by  defining  this  method, 
which  we  suggestively  call  unigrid.  (See  [9]  for  more  details.) 

For  focus,  suppose  A  is  an  n  x  n  real,  symmetric,  positive  definite  matrix. 

With  f  in  Rn,  then  the  problem  is  to  find  u  in  Rn  satisfying 


(1)  Au  =  f. 

Many  iterative  methods  for  solving  (1)  can  be  described  as  directional  iterations  in 
the  following  sense.  With  an  approximation,  U,  in  Rn  to  u  (such  approximations  will 
always  be  represented  in  capitals),  then  a  direction  d  in  Rn  is  computed  (the  choice 
of  direction  d  defines  the  method)  and  used  to  update  U  in  such  a  way  that  the  new 
residual  error  is  orthogonal  to  d.  More  precisely,  let  r  =  AU  -  f.  Then  an  iteration 
with  direction  d  is  given  by 

(2)  U  U  -  sd 


s 


<r,  d> 
<Ad,d> 


Here  we  use  the  arrow  to  denote  replacement,  thereby  avoiding  the  use  of  subscripts 
for  iteration  indices.  We  write  (2)  in  the  compact  form 


(3) 


U  +•  Gd(U). 


One  sweep  of  Gauss-Seidel  applied  to  (1)  can  be  written  as  several  iterations 
in  (3)  with  directions  d  =  el5  e2,  ....  en,  where  e^  is  the  i-th  column  of  the  n  x  n 
identity.  Suppose  for  definiteness  that  A  represents  a  uniform  grid  discretization 
of  a  one-dimensional  elliptic  boundary  value  problem  on  a  finite  interval  with 
Dirichlet  boundary  conditions.  Then  vectors  in  Rn  are  thought  of  as  vector  functions 
of  the  interior  grid  points  so  that  e-j  is  the  vector  function  that  is  zero  everywhere 
except  at  the  i-th  grid  point  (where  the  grid  points  are  numbered,  say,  lexicographi¬ 
cally).  It  is  not  difficult  to  see  that  these  spiked  directions  do  not  reduce  the 
error 

(4)  E  =  U  -  u 

very  well.  More  precisely,  the  "oscil latory"  (cf.  [7])  error  components  are  quickly 
reduced,  but  the  "smoother"  ones  are  not.  The  natural  suggestion  then  is  to  also  use 
smoother  directions.  To  this  end,  suppose  for  simplicity  that  n  =  2m  ~  1  and  define 
djf  recursively  by 


(5) 


jn  _ 

di  "  ei’ 

jk  _  T.k+1  j.k+1 

di  "  4*21-1  +  2*21 


<  1  < 


^  lnk+1 

+  4*21+1  • 


1  <i  <2-1,  1  <k<m, 


(These  directions  are  actually  intended  for  use  with  one-dimensional  problems  for  which 
(1)  is  a  discretization.  Higher  dimensional  versions  can  be  defined  by  combinations 
analogous  to  (5) . ) 
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These  directions  are  progressively  smoother  with  decreasing  k.  Note  that  d*  is 
the  tent  function  centered  at  the  midpoint  of  the  interval. 

With  these  directions,  one  cycle  of  unigrid  on  "level"  m  is  now  defined  recur¬ 
sively  in  terms  of  parameters  v,  y  by 

Level  1  cycle:  Perform  one  iteration  via  U  G^U). 

k 

Level  k  cycle:  Perform  v  sweeps  on  directions  d^  where  one  sweep  is 

to  do  U  «-  G  .  (U)  for  i  =1,2,  2k  -  1 .  Now 
di 

perform  y  cycles  on  level  k  -  1. 


The  conventional  multigrid  method  is  a  delayed  displacement  process  of 
updating  the  fine  grid  approximation  U  after  many  computations  are  performed  on 
coarser  levels  m  -  1,  m  -  2,  ....  1.  Though  it  would  be  computationally  more 
expensive,  an  immediate  displacement  multigrid  process  would  correct  U  (and 
compute  the  coarse  grid  equations)  whenever  any  coarse  grid  approximation  update 
is  made.  A  somewhat  subtle  analysis  shows  that  both  of  these  methods  are  in  fact 
equivalent;  it  is  very  easy  to  see  that  immediate  replacement,  and  hence  con¬ 
ventional,  multigrid  is  fully  equivalent  to  unigrid  if  we  define  the  fine-to-coarse 
grid  transfer  operator  in  terms  of  the  coarse-to-fine  operator  l£  as 


(6) 


and  if  the  coarse  grid  operator, 
A-  as 

(7) 


Ac,  is  defined  in  terms  of  the  fine  grid  operator 


(For  the  finest  level  m,  Af  =  A.)  For  the  way  in  which  we  have  defined  unigrid, 

f  T 

I  is  linear  interpolation  although  any  reasonable  interpolation  process  can  be 

employed.  We  call  (6)  and  (7)  the  variational  conditions  because  they  are 
naturally  satisfied  by  finite  element-type  discretizations  of  (1). 

We  have  described  a  version  of  multigrid  designed  for  one-dimensional  discre¬ 
tizations.  To  extend  unigrid  to  higher  dimensions,  it  is  simple  to  define  the 
corresponding  smooth  directions.  (Actually  they  are  the  interpolants  if  the 
coordinate  functions,  e^,  from  coarse  grids.) 

If  unigrid  is  equivalent  to  multigrid  but  computationally  less  efficient,  then 
what  is  its  purpose?  In  addition  to  analytical  simplicity  which  leads  to  a  very 
simple  theory  [8],  unigrid  is  very  easy  to  program.  In  fact,  to  modify  an  existing 
possibly  very  complex  software  package  (say  one  that  solves  a  complex  system  of 
time  dependent  equations)  that  presently  implements  Gauss-Seidel  (or  SOR  or  some 
other  relaxation  scheme),  it  is  enough  to  modify  the  relaxation  routine.  Thus, 
design  involves  only  computing  the  direction  (which  is  equivalent  but  somewhat 

simpler  than  defining  l£).  Implementing  unigrid  does  not  require  defining  any 
other  grid  transfer  operators,  scale  factors,  or  coarse  grid  equations.  Implemen¬ 
tation  of  the  design  principles  (.6)  and  (7)  is  automatic.  Moreover,  unigrid  does 
not  impact  the  software  data  structure.  If  the  directions  are  generated  each  time 
they  are  used,  then  no  coarse  grid  information  is  stored.  Finally,  many  algorithm 
variations  can  be  implemented  and  tested  much  more  quickly  and  safely  than  with 
conventional  multigrid.  Once  the  design  is  completed,  this  multigrid  "simulator" 
may  be  replaced  by  a  careful  implementation  of  conventional  multigrid  with  the 


306 


confidence  that  a  good  design  was  achieved  and  with  the  ability  to  use  unigrid  as 
a  benchmark  to  ensure  the  correctness  of  the  final  product. 

To  illustrate  the  simplicity  of  unigrid,  we  include  the  liltings  of  a  code 
for  solving  % 

-Au  +  ex+^u  *  sin  3(x  +  y)  in  fl  =  (0,  3)  x  (0,  2) 

(8) 

u  =  cos  3(x  +  y)  on  3a. 

It  was  programmed  in  BASIC  on  an  HP9845  and  uses  the  usual  five-point  discretiza¬ 
tion  on  the  fine  grid  (although,  because  of  (6)  and  (7),  it  simulates  nine-point 
stencils  on  coarser  grids).  To  apply  unigrid  to  a  more  general  operator  in  (8), 
simple  changes  should  be  made  to  statements  210  and  350-380. 

The  cycling  scheme  is  very  simple  (not  as  defined  above).  This  can  be  seen 
in  the  sample  runs  which  are  also  included  in  this  paper.  Note  that  level  1  denotes 
the  finest  level,  that  is,  where  Gauss-Seidel  sweeps  are  performed.  Note  that  the 
performance  is  the  same  for  h  =  .25  as  it  is  for  h  =  .125. 

Modifications  to  unigrid  can  be  made  very  quickly.  We  have  many  versions  now 
in  use  for  research  purposes  and  are  continuing  to  develop  others  for  further 
study  (e.g.,  for  different  cycling  schemes,  relaxation  processes,  and  orderings, 
nonlinearities,  eigenprobl ems ,  irregular  boundaries,  nonsymmetric  and/or  nonpositive 
definite  operators,  and  more  general  problems).  No  version  has  taken  more  than  an 
hour  (and  usually  just  a  few  minutes)  to  produce. 
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10  DIM  U <97,65/ 

20  DISP  "0  GRIDS"; 

20  INPUT  H 

40  DISP  "#  K  PIONTS,  INCL.  BOUNDARY  POINTS"; 

50  INPUT  II 
60  BISP  "V  POINTS" ; 

70  INPUT  J 1 
80  DISP  "H" 

90  INFUT  H 

100  DISP  "#  RELAXATIONS"; 

110  INPUT  N0 

120  DISP  "CONVERGENCE  TOLERANCE"; 

130  INPUT  T 

140  DISP  "MAX  #  CYCLES"; 

150  INPUT  Cl 

160  PRINT  "#  GRIDS=" ;n;"  #X  PO I  NTS* " ; I  1 ;  “  #Y  POINTS--; Ji; -  H  = " ; H 

170  PRINT  "4  RELAXATIONS-"  ; H0;  "  CONVERGENCE  TOL=";T;“  MAX  #  CYCLES"; Cl 

180  C=0 

190  FOR  1=1  TO  II 
200  FOR  J=1  TO  Jl 
210  U< I , J>=C0S<3*CI+J-2>*H> 

220  NEXT  J 

230  NEXT  I 

240  C=C+ 1 

250  FOR  K= 1  TO  H 

260  M 1  -  £  (  N  -  K  ) 

270  FOR  N 3=1  TO  NG 
280  E=0 

290  FOR  I=1+M1  TO  II -Ml  STEP  Ml 
300  FOR  J  = 1 +  M 1  TO  Jl-Ml  STEP  Ml 
310  A  1=0 
320  R 1 =0 

330  FOR  13=1- M 1+1  TO  I +M 1-1 
340  FOR  J3=J-M1+1  TO  J+Ml-1 
350  B=4+EXP<  < I3  +  J3-2 >*H>*H*H 

360  R  =  Ii*UC 13. J3)-U< 13, J3-1>-U<13, J3  +  1 >-U<I3-l, J3)-U<13+1, J3) 

370  R=R-SIN<3*< 13+ 

380  A3=D*FNB< 13, J3)-FND< 13, J3+1 )-FND< 13, J3-1 >-FND< 13+1 , J3>-FNB< 13-1 , J3> 

390  R1=R1+FND(I3,J3>*R 

400  01=01 +FND < 1 3 , J3 ) +A3 

410  NEXT  J 3 

420  NEXT  13 

430  S  =  R 1  A 1 

440  E=E+R1*R1 

450  FOR  13=1- M 1+1  TO  I+Ml-1 
460  FOR  J 3  =  J - M 1  +  1  TO  J+Ml-1 
4  70  U<  13,  .J3>=U<  13,  J3)-S*FND<  13,  J3> 

480  NEXT  J 3 

490  NEXT  13 

500  NEXT  J 

510  NEXT  I 

520  E  =  SQRCE.)*M1aH 

530  PRINT  "LEVEL=";N-K+i;“  ERROR3 "  ;  E 

540  NEXT  H3 

550  NEXT  K 

560  IF  E< T  THEN  590 

570  IF  C<C1  THEN  240 

530  DEF  FND< 13, J3 >  =  < M 1 - ABS < I - 1  3 > >*<M1-ABS< J-J3) V<M1*M1 ) 

590  END 


UNIGRID  LISTING 
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POINTS-  13  # V  POINTS 

3*  CONVERGENCE  TOL=  0 


#  GRIDS-  3 

#  RELRXRT IONS= 


LEVEL-  3 

ERROR- 

LEVEL*  3 

ERROR* 

L  E  V  E  L  =  3 

ERROR* 

L  E  V  E  L  =  2 

ERROR* 

LEVEL*  2 

.  ERROR* 

LEVEL*  2 

ERROR* 

LEVEL*  1 

ERROR* 

LEVEL*  1 

ERROR* 

LEVEL*  1 

ERROR* 

LEVEL*  3 

ERROR* 

LEVEL*  '3 

ERROR* 

LEVEL*  3 

E  R  R  0  R  * 

LEVEL*  2 

ERROR* 

LEVEL*  2 

ERROR* 

LEVEL*  2 

ERROR* 

LEVEL*  1 

ERROR* 

LEVEL*  1 

ERROR* 

LEVEL*  1 

ERROR* 

159.773562628 
IS  -  S  3  7  6  2  O  7  3  3  3 
. 130523571886 
1 55.659030674 
IS- 1746496834 
1 . 07841439539 
41  -  532174974 
7  -  501 578 1G532 
2 .  S  6  2  4  3 1  1  2  S 1  8 
2 , 89527338022 
. 145588353173 
1 . 75340862750E-03 
2  *  44258558694 
. 202845696484 
2- 21  1 7 1 3  9  4  2  6  9  E - O  2 
. 86192013*544 
-  21532575187 
7. 70327718248E-G2 


9  H“  .25 
my  #  CYCLES 


UNIGRID  SAMPLE  RUN  ON  13  x  9 


ft  GR I DS=  4  #X  P  0 1 H T  S =  25  #Y  POINTS®  1?  H=  .125 
#  RELAXATIONS®  3  CONVERGENCE  TOL=  0  MAX  #  CYCLES 


LEVEL*  4 

ERROR® 

60 1  -  84 198548 

LEVEL*  4 

ERROR* 

45. 7094831806 

LEVEL*  4 

ERROR* 

•  6345071 6o 1 62 

LEVEL*  3 

ERROR* 

594.691917146 

LEVEL*  3 

ERROR® 

56. 586100575 

LEVEL*  3 

ERROR* 

5 . 8  6  4  2  5  G  6  3  6  9  6 

LEVEL®  2 

ERROR* 

139.531317262 

LEVEL*  2 

ERROR* 

17. 0539436643 

LEVEL*  2 

ERROR* 

5.53009378558 

LEVEL*  1 

ERROR* 

35.7542783386 

LEVEL*  1 

ERROR* 

7. 2O252546707 

LEVEL®  1 

ERROR* 

3. 351 1930668 

LEVEL*  4 

ERROR* 

9. 83004014544 

LEVEL*  4 

ERROR* 

. 439235064975 

LEVEL*  4 

ERROR* 

7. 32638861 133E 

LEVEL*  3 

ERROR* 

7. 437197O0515 

LEVEL*  3 

ERROR* 

. 44716481473 

LEVEL*  3 

ERROR* 

5. 45572374272E 

LEVEL*  2 

ERROR* 

3.7693134515 

LEVEL*  2 

ERROR* 

. 515133079613 

LEVEL*  2 

ERROR* 

. 147593559542 

LEVEL*  1 

ERROR* 

. 7377 13994932 

LEVEL*  1 

ERROR* 

. 173224759378 

LEVEL*  1 

ERROR* 

8.31 1 53 1 27376E 

UNIGRID  SAMPLE  RUN  ON  25  x  1 


GRID 


2 


7  GRID 
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SAFIi  L1FF  DESIGN  OF  GUN  TUFFS  - 
SOMF  NUMFR1 CAL  MFTHODS  AND  RESULTS 
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ABSTRACT.  After  firing  a  limited  number  of  rounds,  a  gun  tube  may 
develop  multiple  radial  cracks  emanating  from  its  boundaries.  These 
cracks  grow  under  the  cyclic  pressurization  due  to  firing  until  they 
reach  a  critical  length,  at  which  stage  catastrophic  brittle  failure  may 
occur.  The  fundamental  safety  requirement  is  that  a  tube  be  withdrawn 
from  service  before  such  failure. 

In  order  to  reduce  the  rate  of  crack  growth,  it  is  common  practice 
to  induce  compressive,  residual  stresses  at  the  bore  of  a  gun  tube  by  an 
autofrettage  process  which  involves  suitable  pressurization  or  swaging 
during  manufacture. 

In  this  paper,  we  describe  the  numerical  solution  of  a  range  of 
problems  encountered  in  the  safe-life  design  of  a  gun  tube,  namely: 

a.  The  prediction  of  residual  stress  fields  arising  from  full 
or  partial  autofrettage. 

b.  The  correction  of  these  stress  fields  to  account  for  the 
non-ideal,  Bauschinger  effect  on  unloading  of  the  tube  during 
manufacture. 

c.  Prediction  of  crack  tip  stress  intensity  factors  for 
multiple  cracks  in  pressurized,  autofrettaged  barrels  using  the 
modified  mapping  collocation  method. 

d.  Calculation  of  gun  tube  lifetime  using  stress  intensity 
factor  data  and  a  fatigue  crack  growth  law. 

Finally,  some  outstanding  problem  areas  are  noted,  and  possible 
numerical  techniques  are  proposed  for  their  solution. 

1.  INTRODUCTION.  Fatigue  crack  growth  arising  from  the  cyclic 
pressurization  of  thick  cylinders  can  produce  a  regular  array  of  up  to 
SO  equal-length  radial  cracks  emanating  from  the  bore  [1].  A  knowledge 
of  the  crack  tip  stress  intensity  factor,  K  is  necessary  in  order  to 
predict  the  fatigue  growth  rate  and  critical  length  of  such  cracks. 
Several  solutions  are  available  for  the  case  of  a  cracked,  pressurized 
thick  cylinder  [1-6].  It  is  likely  that  the  most  accurate  of  these 
solutions  are  those  derived  by  use  of  the  Modified  Mapping  Collocation 
(MMC)  method.  These  include  the  solution  in  reference  [5]  for  up  to 
four  internal  or  external  radial  cracks,  and  that  in  reference  [6]  for 
up  to  40  internal  radial  cracks.  The  errors  associated  with  the  MMC 
technique  are  generally  estimated  as  being  less  than  1%. 


1.  Materials  Branch,  Royal  Military  College  of  Science,  Shrivenham, 
SN6  SLA,  UK 

2,  Army  Materials  S  Mechanics  Research  Center,  Watertown,  MA,  02172 
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To  inhibit  fatigue  growth  of  internal  cracks  it  is  common  practice 
to  produce  a  more  advantageous  stress  distribution  involving  residual 
compressive  hoop  stresses  near  the  bore,  by  autofrettage  treatment  of 
the  cylinder  prior  to  use  [7],  K  solutions  exist  for  a  multiply- cracked 
fully  autofret taged  (100%  overstrain*)  tube  [6],  [8].  Reference  [6]  is 
an  MMC  solution.  However,  the  optimum  autofrettage  condition  may  not  be 
100%  overstrain  [7]  since  fatigue  cracks  may  develop  at  the  outside 
radius  as  a  result  of  the  relatively  high  tensile  residual  stress. 

Clearly,  the  choice  of  the  optimum  overstrain  condition  will  involve  a 
consideration  of  the  rates  at  which  external  cracks  will  grow  radially 
inwards,  and  the  rates  at  which  internal  cracks  will  grow  outwards.  In 
each  case,  prediction  of  crack  growth  rate,  critical  crack  length  and 
residual  strength  will  depend  on  a  knowledge  of  the  crack-tip  stress 
intensity  factor.  The  designer  requires  accurate  stress  intensity  factors 
for  both  internally  cracked  and  externally  cracked  tubes  with  internal 
pressure,  and  any  amount  of  autofrettage  from  zero  to  100%  overstrain 
(full  autofrettage). 

Life  prediction  is  normally  based  on  the  stress  intensity  factor 
calibration  and  an  associated  empirical  crack  growth  law  [9]*  However, 
there  is  evidence  to  suggest  that  life  predictions  based  on  the  K  values 
obtained  from  'ideal1  autofrettage  distributions  may  significantly 
overestimate  the  life  of  a  given  tube  [10].  One  possible  explanation 
for  this  is  the  Bauschinger  effect  [11],  which  is  evident  when  certain 
materials  are  loaded  in  compression  after  initial  tensile  loading,  this 
causes  a  reduction  in  the  'ideal1  residual  stress  field  following  autofrettage. 

Each  of  the  above  aspects  is  considered,  with  particular  emphasis 
on  the  numerical  solution  of  a  number  of  problems  encountered  in  gun 
tube  life  prediction. 

2.  THE  BAUSCHINGER  EFFECT.  In  determining  the  residual  stress 
field  in  a  thick  cylinder  which  has  undergone  plastic  deformation  it  is 
normal  to  assume  an  elastic/perfectly  plastic  stress- strain  curve  of  the 
form  illustrated  in  Fig.  1(a).  This  behaviour  implies  the  same  magnitude 
of  yield  stress,  Y  in  tension  and  compression.  However,  the  stress- 
strain  curve  for  certain  gun  steels  may  be  of  the  form  illustrated 
schematically  in  Fig.  1(b).  The  significant  features  of  this  'real* 
behaviour  are: 

a.  A  small  amount  of  plastic  strain-hardening  (slope  fj)  typically 
a  strain  hardening  exponent  of  0.05.  This  may  alter  the  residual 
stress  field  by  4%  [12]. 

b.  A  very  small  modification  to  the  residual  stress  field  of 
approximately  1%  as  a  result  of  compressibility  of  the  material 
[13]. 

c.  There  is  a  significant  Bauschinger  effect  [11],  i.e.  the  yield 
stress  in  compression  is  less  than  that  in  tension. 


*  Overstrain  is  the  proportion  of  the  cylinder  wall  thickness  that  is 
subjected  to  plastic  strain  during  the  autofrettage  process. 
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d.  The  shape  of  the  unloading  portion  of  the  curves  (CD  and 
C'D')  is  unchanged  with  differing  amounts  of  plastic  flow 
within  the  plastic  strain  limits  employed  in  the  production 
of  gun  tubes. 

For  the  purposes  of  this  paper,  the  'typical'  behaviour  illustrated 
in  Fig.  1(b)  is  modelled  as  a  series  of  straight  lines,  with  zero  strain 
hardening,  and  a  yield  strength  in  compression  of  (-ctY) ,  Fig.  1(c). 

3.  THICK  CYLINDER  THEORY.  Consider  a  tube,  internal  radius  a 
external  radius  b,  which  is  subjected  to  an  internal  pressure  p.  Fig.  2. 
The  distribution  of  hoop  (o^)  and  radial  (o^)  stress  in  this  case  is 
given  by  Lame’s  equations  as: 


a 


e 


2 

=  a  P 

,2  2 
b  -a 


2 

3  a  P 

,2  2 

b  -a 


where  r  is  the  radius  at  which  the  stress  is  defined. 


0) 


Assuming  elastic-perfectly  plastic  material  properties,  and  plane 
strain  conditions,  employing  Tresca's  yield  criterion,  but  omitting  the 
analysis,  the  pressure  p*  to  cause  yielding  of  the  tube  out  to  a  radius 
r=c  (Fig.  3)  is  given  by  [11]: 


p*  =  Y  In  (c/a)  +  (b2-c2) 

2b^ 


(■ 2 ) 


where  Y  is  the  uniaxial  yield  stress  for  the  material.  This  will  give 
directly  the  pressure  for  initial  yielding  at  the  bore,  pt: 


P 


★ 

i 


fx2  2. 
(b  -a  ) 


(3) 


and  the 


pressure  for  complete  yielding  of  the  tube. 


p*  =  Y  In  (b/a)  (4) 

If  the  cylinder  is  subjected  to  a  pressure  p*,  [pt  <  p*  <  p* ] , 

there  will  be  partial  yielding  of  the  tube  out  to  a  radius  c.  Fig.  3. 
The  hoop  stresses  produced  by  this  pressurization  are: 


a*  =  -p*  +  Y  (1  +  £n  (r/a)) 

U 

o*  =  -p*  +  Y  £n  (r/a) 


a  <  r  <  c 


(5) 
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c  <  r  <  b 


(6) 


0 


0 


1 

Yc" 


If  the  pressure  p*  is  subsequently  removed  completely,  assuming 
that  the  unloading  is  entirely  linearly  elastic,  with  no  reversed  yielding 
(valid  provided  b/a  <  2.22),  the  residual  hoop  stress  distribution. 


is  given  by  [11]: 


“  -p*  +  Y  (1  +  £n  (r/a))  - 
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3*  +  Y  n  (r/a)  -  £~- 
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r  2  2 

Yc 
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a  [■  -  s 


a  <  r  <  c  (7) 


c  <  r  <  b 


(8) 


Clearly,  a  re-pressurization  of  the  tube  to  a  pressure  p<p*  will  produce 
a  stress  distribution  which  may  be  calculated  by  the  addition  of  (7) 
and  (1)  for  r<_  c,  and  (8)  and  (1)  for  r>  c. 

Assuming  a  reduced  compressive  yield  strength  of  (~aY)  as  a  result 
of  the  Bauschinger  effect,  there  is  now  the  possibility  of  reversed 
yielding  outwards  from  the  bore  to  a  radius  d.  Fig*  4.  In  the  region  of 
reversed  plasticity  the  stresses  are: 


a_  =  -otY  (1  +  £n  (r/a)) 

o 

o  =  -aV  £n  (r/a) 
r 


a  <  r  <  d 


(9) 


which  satifies  the  two  requirements  that  ar=0  @  r=a  and  Tresca’s  criterion. 


namely  -o  =  -a  Y,  a  <  r  <  d, 

0  r  —  — 

Consider  now  the  elastic  region  r  d.  As  a  result  of  unloading 
and  yielding  the  elastic-plastic  interface  at  r=d  experiences  an  additional 
radial  stress  o  (=-p),  given  by  equation  (9)  minus  equation  (5),  thus: 

-p  .  =  P*  -  (1  +  a)  Y  £n  (d/a)*  (10) 

r|  r~d 
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Thus  the  stresses  in  the  .elastic  region  are  composed  of  (5)  and  (6)  plus 
some  additional  pressure  p  applied  at  r=d  as  a  result  of  unloading  and 
reversed  plasticity. 

If  there  is  reversed  yielding  on  unloading  out  to  a  radius  d  (d<c) , 
material  at  any  point  r>d  will  see  the  combination  loading  and  yielding 
as  the  application  of  an  additional  pressure  p  at  radius  d,  such  that 


°6  =  p  b2-d2 

,2 

0r  =  P  72  7 


f'fl 

>r-SJ 


d  <  r  <  b 


(11) 


The  requirement  for  the  outer  region  d  <  r  <  c  is  that  at  r«d 

T  T 

it  is  just  yielding.  The  total  stresses  oQ  and  a  given  by  the  super- 

9  r 

position  of  (5)  and  (11)  are: 

,2 


-p?T7 

? 

rr  J  *- 


2  2 
b  -d 


rs 

-p*  +  Y  (1  +  £n  (r/a)) 

r  2 1 

► 

h-b 

-p*  +  Y  £n  (r/a) 

L  r  J 

4 

d  <  r  <  c  (12) 


But  we  know  that  Tresca's  criterion  applies,  thus: 


(oT  -  oT)  =  -aY  t  r=d 
0  r 

and  from  (12)  and  (13) 

,2 


aY  =  Y  +  p  ~~~ 
b  -d* 


2b 

,2 


But  the  interface  pressure  is  given  by  equation  (10). 
(10)  and  (14) : 


(13) 

(14) 

Thus,  combining 


£■  -  u 


v  -  “  •  “>  [(~r)  *  tn  <d/a)] 

Substituting  from  (10)  into  (12),  recognizing  that  p  -  -oi 


(15) 


|r=d 
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=  -  {p*  -  Cl  +  a)  Y  £n  (d/a)} 


=  -  {p*  -  (1  +  a)  Y  fcn  (d/a)} 


b2-d2 


b2-d2 


+  b2- 
2 

-p*  +  Y  (1  +  tn  (r/a)) 

r  _ 

> 

’l  -  b2~ 
2 

-  p*  +  Y  tn  (r/a) 

r  „ 

2 

(1C) 


Superimposing  (6)  and  (11)  and  substituting  from  (10): 


ri  -  b2i 
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1^ 

- i 

2-, 


Yc; 

2b" 

2 


|p*  -  (1  +  a)  Y  £n  (d/a)}  1 


Yc 


L  2b 


{?*  -  (1  +  a)  Y  In  (d/a)}  -=^-=- 

b  -d^J  J 


>  cl  r<  b  (17) 


Equations  (7)  and  (8)  together  with  (2)  define  the  residual  stress  field 
after  removal  of  autofrettage  pressure  when  there  is  no  reversed  yielding, 
whilst  equations  (9),  (16),  (17)  together  with  (2)  and  (15)  define  the 
residual  stress  field  in  instances  where  reversed  yielding  occurs. 


For  yielding  not  to  occur  on  unloading: 


(°C 


a  ) 
r 


<  -ctY 


r=a 


i.e.  from  equation  (7): 


p*  < 


(1  +  a)  Y 
2 


(18) 


(19) 


or  in  terms  of  the  pressure  for  initial  yielding  pt,  equation  (3),  for 
no  reversed  yielding: 


p*  <  (1  +  a)  pt  (20) 

For  example,  consider  a  cylinder  with  b/a=2,  a=0.5,  then  from  (20), 
for  no  reversed  yielding: 

p*  <  1 .5  pt 
Now  since: 

Y  2  2 

P?  =  — T  (b  -a  j  =  .375  Y 
2b 
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wc  obtain  from  (2): 


1.5  pt  -  . 5625Y  =  Y  £n  (£)  +  ( b2-c 2) 

1  a  2t> 

A  straightforward  iterative  process  gives  c/a=1.33,  thus  any  overstrain 
in  excess  of  33%  will  cause  reversed  yielding  at  the  bore. 

Clearly,  it  will  also  be  necessary  to  iterate  on  equation  (15)  in 
order  to  calculate  d.  Residual  stress  distributions  for  cylinder  ratios 
(b/a)  of  2.0  and  3.0  are  shown  in  Fig.  5,  for  0.25  5  a  1*0, 

4 .  PREDICTION  OF  CRACK  TIP  STRESS  INTENSITY  FACTORS  BY  MODIFIED 
MAPPING  COLLOCATION  (MMC) ,  Complex  variable  methods,  due  to  Muskhelishvili 
[lT]  are  utilized.  Stresses  and  displacements  within  a  body  are  represented 
in  terms  of  complex  stress  functions.  By  employing  an  MMC  technique  as 
described  in  [5,  15]  the  cracked  ring  segment  in  the  physical  (z)  plane. 

Fig,  6,  is  mapped  from  a  retangular  region  in  the  *Y  (parameter)  plane. 
Traction-free  conditions  along  A!Bf  and  D'E1  in  the  parameter  plane  are 
ensured.  The  singularity  is  removed  from  the  parameter  plane  by  mapping 
a  unit  semi-circle  onto  the  appropriate  crack  surfaces.  Fig.  6.  A 
series  representation  of  the  stress  function  is  selected,  which  ensures 
appropriate  symmetry  conditions.  The  stress  and  displacement  boundary 
conditions  applicable  to  the  problem  in  the  physical  (z)  plane  are: 
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over 

DC 
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BC 

where  p(r)  is  equal  and  opposite  to  the  loading  along  the  crack  line  in 
the  unflawed  structure  for  the  case  of  internal  pressure,  autofrettage 
or  thermal  loading,  the  latter  two  stress  states  being  essentially 
equivalent  [16]. 

In  the  MMC  method  the  infinite  series  representations  of  the  complex 
stress  functions  are  truncated  to  a  finite  number  of  terms.  Force  conditions 
are  imposed  at  selected  boundary  points  along  CD,  EF,  and  FG,  which 
gives  conditions  on  the  unknown  coefficients  in  the  stress  functions. 

Thus  each  boundary  point  produces  two  rows  in  the  main  matrix  A,  and  two 
corresponding  elements  in  the  boundary  conditions  vector  b,  where: 

A  x  =  b 

and  x  is  the  vector  of  unknown  coefficients.  In  general  A  is  a  matrix 
of  £  rows  and  m  columns,  where  £  and  m  depend  upon  the  number  of  boundary 
points  and  unknown  coefficients  respectively.  It  was  found  that  conver¬ 
gence  is  generally  better  when  2m  <  £  <  2.5m,  this  conforms  with  other 
workers  [17],  A  least-squares  error  minimization  procedure  was  used  to 
solve  the  overdetermined  set  of  linear  equations. 
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Knowing  the  coefficients  for  the  stress  function  in  the  cracked 
region,  the  crack  shape  and  stress  intensity  factor,  K,  may  be  determined 
in  a  straight  forward  manner  [14],  In  all  cases  considered  there  is 
symmetry  of  loading  and  geometry  about  the  crack  line,  the  only  non* zero 
stress  intensity  factor  being 

Results  for  internal  pressure  in  the  bore  and  cracks  (each  of 
length  £ )  are  presented  in  Fig*  7  for  b/a  ratio  of  2,0.  Equivalent 
results  for  the  case  of  full  (100%)  ideal  autofrettage  and  steady-state 
thermal  stressing  appear  in  Fig.  8.  The  form  of  the  results  at  short 
crack  lengths  is  shown  in  Fig.  9  indicating  good  convergence  to  the 
limiting  value*  Again,  the  short  crack  length  convergence  is  good,  as 
is  that  at  longer  crack  lengths.  For  the  particular  case  of  50  cracks, 
and  b/a  varying  from  1*2  to  2.0,  results  for  full  autofrettage  are  shown 
in  Fig.  10.  By  superposition  of  these  results  it.  is  possible  to  determine 
K  for  any  combination  of  internal  pressure,  full  autofrettage  or  steady- 
state  thermal  loading.  Furthermore,  provided  the  crack  tips  do  not 
extend  beyond  the  minimum  radius  to  which  plastic  flow  was  induced 
during  the  autofrettage  process,  it  is  also  possible  to  obtain  K  values 
for  partial,  autofrettage  by  a  straightforward  superposition  [18]* 

A  set  of  results  for  internal  cracks  with  internal  pressure  and  50% 
overstrain  based  on  the  results  of  reference  [6],  and  the  superposition 
described  in  [18],  is  presented  graphically  in  Fig.  11.  A  further  set 
of  results  for  external  cracks  with  internal  pressure  and  50%  overstrain, 
based  on  the  results  of  reference  [5]  is  presented  graphically  in 
Fig.  12. 


5.  CALCULATION  OF  TUBE  LIFETIME,  The  prediction  of  life  using 
Linear  Elastic  Fracture  Mechanics  and  a  crack  growth  law  is  well  known 
[9].  It  consists  of  defining  the  stress  intensity  range  AK  as: 


AK  = 

K  -  K 

max  min 

K  .  > 

mm 

0 

(21) 

AK  = 

K 

max 

K  .  < 

mm  — 

0 

(22) 

where  K  andK  .  are  the  effective  maximum  and  minimum  stress 
max  mm 

intensity  values  respectively  during  a  given  loading  cycle.  Equation 
(22)  implies  that  the  part  of  the  fatigue  cycle  during  which  the  crack 
is  closed  at  its  tip  (i.e,  K<_  0)  makes  no  contribution  to  crack  growth. 
For  much  of  a  component's  lifetime,  the  fatigue  crack  growth  rate  is 
related  to  the  stress  intensity  factor  range  by  [9] : 


=  C(AK)M  (23) 

where  N  represents  the  number  of  cycles,  and  C  and  M  are  experimental ly 
determined  constants.  In  general  C  and  M  are  also  functions  of  the  R 
ratio,  where: 
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(24) 


R  =  K  .  /  K 

nun  max 


K  .  >0 

min 


R  =  0 


K  .  <  0 

min  — 


(25) 


However,  in  this  paper  we  ignore  the  (relatively)  limited  effects 
of  changing  R  ratio,  and  emphasize  the  effects  of  the  residual  stress 
field  contributions  to  K.  The  question  of  changing  R  ratio  during 
fatigue  crack  growth  through  a  residual  stress  field  is  considered  in 
detail  in  [19].  Note  that  there  does  not  appear  to  be  any  reason  to 
assume  that  the  superposition  principle  is  violated  by  'stress  fading' 
during  fatigue  crack  growth  through  residual  stress  fields  at  stress 
levels  which  only  produce  localized  (crack  tip)  yielding  [19]. 

Consider  a  tube  containing  a  residual  stress  field.  When  a  crack 

is  introduced  it  has  a  residual  stress  intensity  Kj.  The  tube  is  then 

subjected  to  a  cyclic  pressure  loading.  The  stress  intensity  contributions 

produced  by  this  loading  are  Kj  and  Kj  ,  the  maximum  and  minimum 

max  min 

values  of  stress  intensity  produced  by  the  pressure  loading. 

In  general  we  note  that  equations  (21)  and  (22)  give: 


AK  =  Kj  -  Kj  \ 
max  min 


R  = 


ki  .  +  k? 

min 


+  K 


max 


R 


Kj  +  Kj  >  0 

min 


(26) 


AK  =  Kj  +  kJ 
max 


R  =  0 


Kj  *  5  0 

min 


(27) 


In  order  to  predict  lifetime  to  failure  for  a  gun  tube  it  is 
necessary  to  rearrange  equation  (23)  to  give 


Number  of  cycles. 


JL 

*-  f 


d£ 


C  ( AK) 


M 


(28) 


where  £  is  some  appropriate  initial  crack  length,  and  £  is  the 
o  c 

critical  crack  length  at  which  catastrophic  brittle  failure  will  occur. 
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In  general,  the*  integral  cannot  be  evaluated  exactly  and  it  is  necessary 
to  integrate  in  a  step-wise  fashion  in  order  to  determine  total  lifetime 
[20].  For  instance,  assuming  a  tube  with  internal  radius  50mm  and 
external  radius  100mm  containing  an  array  of  40  radial  cracks,  each  of 
length  5mm,  we  may  calculate  the  lifetime  of  the  tube  at  working  pressures 

-  2 

of  400,  450  and  500  MNm  ,  for  varying  amounts  of  autofrettage  from  0  to 


100%.  [The  material  properties  assumed  were:  Yield  strength  1200  MNm 

j  2 

Fracture  Toughness  90  MNm  ,  Empirical  crack  growth  constants,  M=3.1, 


01.455  x  10 


for  crack  growth  in  metres/cycle] 


The  results  for  this  particular  case  are  illustrated  as  continuous 
lines  in  Fig.  13,  and  would  lead  to  the  initial  conclusion  that  the 
largest  possible  amount  of  autofrettage  is  required.  However,  this  may 
not  be  the  case.  The  dotted  lines  on  Fig.  13  represent  the  lifetime  for 
an  external  crack  of  initial  length  0.05mm  which  grows  radially  inward 
under  the  same  internal  cyclic  pressure.  Thus,  for  pressures  of  SOOMNm 
there  would  be  no  advantage  in  exceeding  27%  overstrain,  since  tube 
lifetime  is  then  limited  by  growth  of  the  external  crack.  Indeed,  any 
increase  in  overstrain  would  tend  to  increase  the  growth  rate  of  the 
external  crack  by  causing  an  increase  in  R  value.  Whilst  the  relative 
positions  of  the  lifetime  curves  in  Fig.  13  will  vary  with  material, 
initial  crack  lengths,  working  pressures  and  the  nature  of  the  residual 
stress  fields,  the  general  approach  to  the  selection  of  an  optimum 
autofrettage  overstrain  will  be  the  same. 

6 .  CONCLUSIONS  AND  FUTURE  WQRK 

a)  The  residual  stress  field  in  an  autofret taged  gun  tube  may  be 
calculated  exactly  by  assuming  a  simple  Bauschinger  effect  model  which 
accounts  for  a  lower  magnitude  of  the  yield  strength  in  compression  than 
that  in  tension.  Future  work  should  address  the  problem  of  modelling 
the  non-linear  unloading  effects  which  accompany  this  reduced  yield 
strength. 


b)  The  modified  mapping  collocation  (MMC)  method  produces  accurate, 
two-dimensional  stress  intensity  factor  solutions  for  the  case  of  a 
cracked,  internally  pressurized  tube  with  autofrettage.  Of  particular 
note  is  the  accuracy  of  the  selected  MMC  technique  at  short  crack  lengths, 
wherein  lifetime  estimates  are  most  critical,  A  straightforward  super¬ 
position  allows  these  results  to  be  extended  to  the  case  of  partial 
autofrettage.  Future  work  should  include  the  proper  representation  of 
three-dimensional  cracked  configurations  (e.g.  thumbnail  cracks). 

c)  Gun  tubes  may  develop  an  array  of  multiple  cracks.  Future  work 
should  be  aimed  at  understanding  the  factors  influencing  the  stability 
of  such  patterns,  and  the  effects  of  residual  stresses  on  such  stability. 

d)  Crack  growth  rates  due  to  the  cyclic  pressurization  of  gun 
tubes  may  be  predicted  from  a  knowledge  of  the  crack  tip  stress  intensity 
factor  range.  The  optimum  autofrettage  condition  may  not  be  100%  overstrain, 
since  external  cracks  may  grow  inwards  and  produce  failure.  Life  prediction 
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design  curves  may  be  generated  which  permit  a  selection  of  the  optimum 
autofrettage  condition.  Whilst  crack  initiation  time  for  internal 
cracks  is  effectively  zero,  there  is  a  definite  initiation  period  for 
external  cracks.  This  initiation  time  should  be  quantified  to  allow 
accurate  lifetime  design  predictions. 
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Figure  1  :  Elastic-Plastic  Stress-Strain  Curves. 
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Figure  2  :  Internally  pressurized  Figure  3  :  Internally  Pressurized, 

Thick  Cylinder,  Partially  Plastic  Thick  Cylinder, 


Figure  4  :  Unpressurized,  Autof rettaged  Thick  Cylinder 
With  Reversed  Yielding. 


325 


STRESSES  c/b » L  0,  b/a  -  Z  0 


fc-  t  nj 


I-  I  TO 


Figure  5  :  Residual  Stress  Fields  in  Various  Autof rettaged  Tubes 
With  Reversed  Yielding, 
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Figure  6  :  Physical  and  Mapped  Planes. 
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Figure  10  :  Stress  Intensity  Factors  for  Autofrettaged  Tube 
With  50  Radial  Cracks. 


OVERSTRAIN  (%) 


Figure  13  :  Calculated  Lifetimes  for  Tube  with  40  Internal 
Cracks  or  One  External  Crack  Subjected  to 

-2 

Cyclic  Pressure  Loadings  of  400,  450  and  500  MNm 
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GUN  TUBE  FATIGUE  LIFE  ESTIMATES- 
INFLUENCE  OF  RESIDUAL  STRESS,  CRACK  GROWTH 
LAW  AND  LOAD  SPECTRA 


Donald  M.  Neal 
Anthony  P.  Parker 
Edward  M.  Lenoe 

US  Army  Materials  and  Mechanics  Research  Center 
Watertown,  Massachusetts,  02172 


ABSTRACT.  A  gun  tube  should  be  withdrawn  from  service  before 
crack-like  defects  within  the  tube  can  achieve  a  critical  length  and 
cause  catastrophic  brittle  failure.  The  objective  of  this  study  is  to 
conduct  a  sensitivity  analysis  of  the  relative  importance  of  fracture 
toughness,  yield  strength,  proportion  of  autofrettage,  initial  crack 
length  firing  pressure  and  crack  growth  law  parameters  in  the  determination 
of  safe  life  estimates  for  gun  tubes.  By  recognizing  the  importance  of 
the  individual  parameter  in  the  life  prediction  procedure,  requirements 
for  accurate  determination  of  the  parameter  can  be  established. 

The  Monte  Carlo  method  was  applied  in  order  to  simulate  parameter 
variability  in  the  life  time  estimating  process.  A  normal  distribution 
function  was  assumed  where  a  specific  coefficient  of  variation  (C.V.)  described 
the  relative  amount  of  variability.  The  largest  dispersion  in  the  life 
time  estimates  resulted  from  5%  variation  in  the  power  term  of  the  crack 
growth  law.  The  other  parameter  contributed  by  a  considerable  lesser 
amount  in  the  life  variability.  The  results  also  indicated  a  considerable 
advantage  when  the  autofrettage  process  was  applied  to  the  gun  tube. 

The  lognormal  probability  density  function  "best"  represented  probability 
ranked  life  estimates  when  compared  to  the  Weibull  and  normal  functions. 
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NOMENCLATURE 


a 

b 


c 


C 

F 

K 

K 

max 

K  . 
min 

K 


c 

1 

1. 


m 


N 

P 

Q 


Y 


Inner  tube  radius 
Outer  tube  radius 
Autofrettage  radius 

Coefficient  in  Paris'  crack  growth  law 
Proportion  of  autofrettage 
Stress  intensity  factor  (range) 

Maximum  value  of  stress  intensity  during  loading  cycle 

Minimum  value  of  stress  intensity  during  loading  cycle 

Fracture  toughness 

Crack  length 

Initial  crack  length 

Critical  crack  length 

Exponent  in  Paris'  law 

Number  of  loading  cycles 

Pressure 

Configuration  correction  factor 
Yield  strength 

Factor  employed  in  determination  of  K  for  partial  autofrettage 
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1.  INTRODUCTION.  The  fundamental  safety  requirement  for  a  gun 
tube  is  that  it  should  be  withdrawn  from  service  before  crack- like 
defects  which  develop  in  the  tube  during  initial  firing  can  grow  to  a 
critical  length  and  cause  catastrophic,  brittle  failure  of  the  tube. 
Ideally  the  fatigue  life  should  exceed  the  wear  life  of  the  tube,  and 
tube  inspection  should  not  be  necessary  during  service  life.  However, 
there  have  been  in-service  failures  of  gun  tubes  [1],  and  there  is 
evidence  to  suggest  that  a  relatively  small  increase  in  firing  pressures 
(e.g.  for  the  firing  of  long-rod  projectiles)  or  an  improvement  in  wear 
characteristics  of  gun  tubes  may  make  fatigue  life  the  dominant  limiting 
factor  in  life  assessment  [2]. 

A  linear-elastic  fracture  mechanics  approach  to  crack  growth  rate 
prediction  implies  the  need  to  calculate  accurate  stress  intensity 
factor  data,  and  to  fully  understand  the  effect  of  autofrettage  residual 
stresses  [3]  and  multiple  cracking  on  stress  intensity  calibrations. 
Deterministic  studies  relating  to  each  of  these  problem  areas  are  reported 
elsewhere  in  this  publication  [3].  The  objective  of  this  study  is  to 
conduct  a  sensitivity  analysis,  utilizing  standard  Monte  Carlo  simulation 
techniques,  in  order  to  gain  some  understanding  of  the  relative  importance 
of  Fracture  Toughness,  yield  strength,  proportion  of  autofrettage, 
initial  crack  length,  firing  pressure  and  crack  growth  law  on  the  fatigue 
life  of  gun  tube. 

2.  METHOD  OF  LIFE  PREDICTION.  For  much  of  the  lifetime  of  a 
cracked  component,  the  fatigue  crack  growth  rate  is  given  by  Paris'  law: 


di 

dN 


C(K)m 


CD 


where  l  is  the  crack  length,  N  is  the  number  of  cycles  and  K  is  the 
stress  intensity  factor  range,  K  -K  .  [3],  where  K  and  K  .  are 

the  maximum  and  minimum  values  respectively  of  the  stress  intensity 
during  the  loading  cycle.  C  and  m  are  empirical  constants,  which  are 
determined  for  the  particular  material  and  thickness  in  a  standard  test. 

In  order  to  predict  lifetime  to  failure  for  a  gun  tube,  we  write 
equation  (1)  such  that: 


where  l.  is  some  initial  crack  length  (in  the  case  of  a  gun  tube  this  is 
normally  taken  as  the  depth  of  the  heat- check  craze  cracking  which 
appears  at  the  bore  after  the  first  few  rounds  are  fired)  and  l  is  the 
critical  crack  length  associated  with  some  critical  value  of  K  , 
termed  the  fracture  toughness  or  designated  Kc> 
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A  typical  cracked  gun-tube  geometry  is  illustrated  in  Fig.  1.  The 
tube  has  internal  radius  a,  external  radius  b  and  has  been  autofrettaged 
to  a  radius  c.  [Autofrettage  is  a  process  in  which  plastic  flow  is 
induced  in  the  tube  during  manufacture.  This  plastic  flow  commences  at 
the  bore,  and  spreads  radially  outwards.  The  process  induces  an  advan¬ 
tageous  distribution  of  compressive  residual  stresses  in  the  inner 
portion  of  the  tube  which  tend  to  reduce  the  stress  intensity  of  cracks 
emanating  from  the  inner  radius.] 

In  the  case  of  a  pressurized  tube,  it  is  standard  practice  to 
express  the  stress  intensity  factor  range,  K,  as: 

K  =  Q(£)  p (it  £ ) 1/2  (3) 

where  p  is  the  maximum  pressure  during  the  firing  cycle,  and  Q(ji)  is 
some  configuration  correction  factor  which  includes  the  effects  of 
loading  and  geometry.  In  the  case  of  an  autofrettaged  tube: 


K  =  aK  +  K 
P  ‘ 


(4) 


where  K  is  the  stress  intensity  contribution  due  to  internal  pressure 
in  the  bore  and  the  cracks,  is  the  (negative)  stress  intensity  due  to 
full  (100%)  autofrettage  (i.e.  c  =  b) .  Numerical  solutions  for  K  and 
Ka  appear  in  [3],  a  is  a  function  of  the  ratio  of  material  yield^strength 
Y  to  working  pressure,  p,  given  by: 


a 


<t> 


V  fb2-c2)] 
P  2b2  . 


(S) 


whilst  the  autofrettage  radius,  c  is  given  by: 


c  =  F(b-a)  +  a  (6) 

F  being  the  proportion  of  autofrettage  (i.e.  percentage  overstrain  =  100 
x  F.) 


3.  SELECTED  PARAMETERS.  The  mean  parameter  values  considered  are 

-2 

listed  in  Fig.  2.  The  working  pressure,  p  (400MNm-  ),  material  yield 

-2  -  3/2 

strength,  Y  (1200  MNm  )  and  fracture  toughness,  K  (90  MNm  '  )  were 

selected  to  be  typical  of  gun  tube  operation  and  material.  The  proportion 

of  overstrain,  F  spans  the  whole  range  from  zero  to  100%  autofrettage. 

is  typical  of  measured  heat-check  crack  depths.  The  Paris'  law 

constants,  C=1.45  x  10  ^  and  m=3.1  (giving  crack  growth  rates  in  meters 

“3/2 

/cycle  from  stress  intensity  in  MNm  '  )  are  also  characteristic  of  gun 
steel.  The  tube  was  assumed  to  have  an  inner  radius  of  50mm  and  outer 
radius  100mm. 
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The  coefficient  of  variation  (C.V.)  in  each  of  the  parameters  is 
taken  as  5%  throughout,  with  the  exception  of  working  pressure  (2%)  and 
initial  crack  length  (10%),  in  order  to  model  some  of  the  "real-life" 
variations.  The  parameters  marked  with  an  'O'  in  Fig.  2  were  not  varied 
during  the  tests,  since  Y  cannot  influence  life  with  zero  autofrettage, 
and  variations  of  autofrettage  below  zero  and  above  full  autofrettage 
are  physically  unacceptable.  All  of  the  parameters  marked  'XX'  were 
insensitive  at  the  C.V.  levels  employed. 

4.  MONTE  CARLO  SIMULATION.  In  the  simulation  scheme  a  probability 
distribution  function  for  N  described  in  (2)  is  determined.  The  necessary 
parameters  C,  M,  Z  .  and  those  related  to  K  (4,  5,  and  6)  are  represented 
by  a  normal  distribution  function  with  appropriate  means  and  C.V.  (See 
Fig.  2). 

A  random  selection  from  each  of  the  parameter  distributions  is 
inserted  in  (2)  and  solution  for  K  is  obtained.  This  process  is  repeated 
until  all  functional  values  have  been  selected.  Note,  an  equal  number 
of  random  values  for  each  individual  parameters  should  be  generated.  The 
resultant  number  for  the  life  time  distribution  will  be  the  same  as 
those  determined  for  the  parameters.  Although  an  initial  assumption  of 
normality  existed  for  each  of  the  parameters,  the  resultant  life  estimate 
distribution  was  not  normal.  This  situation  often  occurs  in  the  Monte  Carlo 
process . 

The  random  numbers  for  the  individual  functions  are  obtained  from 
generation  of  uniform  random  numbers  and  solving  for  X  in  the  relation. 


(7) 


where  R  =  Uniform  random  number  and 

f^=  normal  frequency  distribution. 

If  the  probability  distributions  of  the  controlling  parameter  are 
known  from  some  experimental  results  or  from  an  analytic  basis,  then  the 
appropriate  distribution  function  f^  may  be  used. 

An  examination  of  the  relative  change  in  the  third  and  fourth 
moments  (Skewness  and  Kurtosis)  as  related  to  the  increasing  number  of 
simulations  provided  the  necessary  mechanism  for  determining  an  acceptable 
number  of  simulations.  Observing  the  Tabulation  of  Moments  vs.  Number 
Trials  in  Fig.  3  indicated  approximately  2000  simulations  would  be 
sufficient.  Acceptable  convergence  of  Skewness  and  Kurtosis  indicates 
functional  distribution  form  does  not  vary  due  to  increasing  number 
trials . 
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5.  SENSITIVITY  ANALYSIS.  Fig.  4  shows  probability  density  histograms 
for  the  case  of  a  non-autofrettaged  tube  with  40  internal  radial  cracks. 
This  number  was  selected  as  being  typical  of  crack  patterns  observed  in 
non-autofrettaged  gun  tubes.  The  results  indicate  a  relative  insensitivity 
to  variations  in  p  of  2%,  of  2  and  5  percent  for  p  and  K  respectively, 
but  a  most  significant  dependence  on  m  with  5%  variation.  Fig.  4(d) 

shows  the  effect  of  varying  all  parameters  simultaneously.  99.9%  life 
with  all  parameters  varying  is  111  rounds.  (Probability  that  lifetime  of 
tube  will  exceed  111  rounds  is  .999.) 

Fig.  5  illustrates  the  results  for  a  tube  with  75%  autofrettage 
(i.e.  F  =  0.75)  with  4  internal  radial  cracks.  This  number  appears 
typical  of  crack  patterns  in  autofrettaged  tubes.  The  parameters,  in 
increasing  order  of  sensitivity  are  l . ,  F  and  m.  The  99,9%  life  in  this 
case  with  all  parameters  varying  is  1&69  rounds. 

Finally,  Fig.  6  illustrates  the  results  for  100%  autofrettage  (F  = 
1.0),  with  4  internal  cracks.  The  sensitive  parameters,  in  increasing 
order,  are  p,  i .  and  m.  The  99.9%  life  is  4683  rounds.  With  increasing 
amounts  of  autofrettage,  it  becomes  more  important  to  include  variations 
in  all  parameters,  not  just  m  (see  Figs.  4,  5,  6). 

As  the  amount  of  autofrettage  is  increased,  K  effect  on  variability 
decreases  while  i  ^  exhibits  the  opposite  characteristics.  This  appears 
reasonable  on  physical  grounds,  since  more  of  the  tube  lifetime  is 
expended  at  very  short  crack  lengths  as  the  amount  of  autofrettage  is 
increased.  Conversely,  the  proportion  of  life  spent  at  longer  crack 
lengths  becomes  less  significant. 

Inspection  of  the  variation  in  mean,  standard  deviation  and  99.9% 
life  indicates  the  very  considerable  advantages  associated  with  the 
autofrettage  process.  In  particular,  the  99.9%  life  is  increased  from 
111  rounds  with  zero  autofrettage,  to  1869  with  75%  autofrettage  and 
4683  with  100%  autofrettage.  Since  gun  tubes  would  normally  be  required 
to  guarantee  something  like  1500  rounds,  and  a  factor  of  safety  is 
required,  it  is  clear  that  the  non-autofrettaged  tube  would  not  be 
acceptable,  whilst  the  75%  and  100%  autofrettage  tube  would  represent 
viable  options,  the  only  additional  cost  being  the  autofrettage  process 
itself,  no  modifications  to  the  material  being  required. 

6.  CUMULATIVE  PROBABILITY  FITTING.  Considering  a  random  selection 
of  300  data  values  with  parameter  variations  of  the  specified  amount 
listed  in  Fig.  2,  we  obtain  the  Normal,  Weibull  and  Lognormal  density 
function  representation  of  the  ranked  data  illustrated  in  Figs.  7,  8  and 
9,  for  zero,  75%  and  100%  overstrain  respectively.  In  all  cases  the  RMS 
errors  and  graphical  results  indicate  that  the  Lognormal  best  represent 
the  data.  This  observation  is  readily  understood  when  we  recall  that 
the  distribution  is  dominated  by  the  effect  of  m,  the  exponent  in  Paris' 
law.  The  mean,  standard  deviation  and  Weibull  parameter  are  listed  in 
Figs.  7,  8  and  9  with  their  corresponding  90%  tolerance  limits.  The  two 
parameter  Weibull  was  considered  to  be  a  more  accurate  representation  of 
the  data  than  the  three  parameter  results.  The  appearance  of  outliers 
in  Figs.  7  and  9  do  not  effect  either  inference  results  or  selection  of 
the  proper  functional  representation. 
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7.  CONCLUSIONS  §  DISCUSSION.  The  combination  of  linear  elastic 
fracture  mechanics  and  an  empirical  crack  growth  law  has  become  the 
standard  method  for  the  calculation  of  fatigue  life  in  gun  tubes.  The 
fundamental  requirement  of  such  an  approach  is  that  the  gun  tube  should 
be  withdrawn  from  service  before  catastrophic  brittle  failure  can  occur. 

The  sensitivity  analysis  conducted  herein  indicates  that  in  non- 
autofrettaged  gun  tubes  the  most  sensitive  parameter  is  m,  the  exponent 
in  Paris'  crack  growth  law.  With  increasing  amounts  of  autofrettage  the 
initial  crack  length  and  proportion  of  autofrettage  are  also  significant 
factors.  For  cumulative  failure  probability,  the  lognormal  distribution 
is  superior  to  both  normal  and  Weibull  distributions  for  zero,  75%  and 
100%  autofrettage. 

The  improvement  in  99.9%  life  resulting  from  large  amounts  of  auto¬ 
frettage,  based  on  typical  materials  and  loadings,  indicates  the  great 
advantages  which  autofrettage  may  provide.  One  particularly  important 
feature  of  the  results  reported  here  is  that  the  current  practice  of 
applying  75%  autofrettage  and  limiting  life  to  approximately  2000  rounds 
for  typical  pressures  and  gun  tube  steels,  is  consistent  with  the  results 
presented  in  Fig.  5  which  were  obtained  using  the  Monte  Carlo  scheme. 

Whilst  this  study  relates  to  a  particular,  axisymmetric  geometry 
containing  residual  stresses,  the  benefits  of  introducing  advantageous 
residual  stresses  in  more  complex  geometrical  configurations,  such  as 
pin- loaded  lugs  and  welded  joints,  are  already  becoming  apparent.  This 
method  of  sensitivity  analysis  would  also  be  applicable  to  such  configurations. 
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Fig.  I :  CRACKED  THICK  CYLINDER  GEOMETRY 
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Fig.  2  :  PARAMETERS  TESTED  IN  MONTE  CARLO  SIMULATION 
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Fig.  3  :  CONVERGENCE  CHARACTERISTICS  FOR 
MONTE  CARLO  SIMULATION. 
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Fig.  4  :  PROBABILITY  DENSITY  FOR  NON-AUTOFRETTAGED 
TUBE  WITH  40  CRACKS. 
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Fig.  5  :  PROBABILITY  DENSITY  FOR  75%  AUTOFRETTAGED 
TUBE  WITH  4  CRACKS. 
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Fig.  6  ;  PROBABILITY  DENS  ITY  FOR  IOO%  AUTOFRETTAGED 
TUBE  WITH  4  CRACKS. 
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Fig.  8  :  CUMULATIVE  DISTRIBUTION  FUNCTIONS  FOR  75%  AUTOFRETTAGED 
TUBE  WITH  4  CRACKS 
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Fig.  9  :  CUMULATIVE  DISTRIBUTION  FUNCTIONS  FOR  100%  AUTOFRETTAGED 
TUBE  WITH  4  CRACKS. 


NUMERICAL  PREDICTION  OF  RESIDUAL  STRESSES  IN  AN 
AUTOFRETTAGED  TUBE  Of  COMPRESSIBLE  MATERIAL 
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U.  S.  Army  Armament  Research  and  Development  Command 
Large  Caliber  Weapon  Systems  Laboratory 
Benet  Weapons  Laboratory 
Watervliet,  NY  12139 


ABSTRACT*  The  residual  stresses  in  an  autofret taged  tube  of  compressible 
material  are  obtained  by  a  new  finite  difference  approach*  The  tube  is 
assumed  to  obey  the  Mlses*  yield  criterion,  the  Prandtl-Reuss  flow  theory  and 
the  isotropic-hardening  rule.  In  order  to  test  the  accuracy  of  the  computer 
program,  a  convergence  study  for  a  nearly  incompressible  tube  has  been  made 
and  compared  with  the  exact  solution  as  well  as  the  simulated  results  for 
residual  stresses  in  an  incompressible  tube. 

1*  INTRODUCTION*  The  importance  of  favorable  residual  stresses  in  an 
autof rettaged  tube  is  well  known  [1].  Many  methods  for  predicting  residual 
stresses  have  been  reported  [2-4]*  For  an  elastic-plastic  material  which 
obeys  the  Mises1  yield  criterion  and  the  associated  flow  rules,  a  closed  form 
solution  exists  only  in  the  plane  strain  case  neglecting  strain  hardening  and 
compressibility  [5].  Recently  a  method  to  simulate  this  problem  by  thermal 
loads  has  been  devised  by  Hussain  et  al  [6].  For  a  compressible  material  with 
or  without  strain  hardening,  a  new  finite  difference  approach  has  been 
developed  by  this  author  [7]-  Two  types  of  incremental  loadings  have  been 
discussed*  In  the  present  paper,  the  numerical  prediction  of  residual 
stresses  in  an  autof rettaged  tube  of  compressible  material  will  be  reported. 
The  effect  of  Poisson’s  ratio  will  be  discussed.  In  order  to  test  the 
accuracy  of  the  computer  program,  a  convergence  study  for  a  nearly 
incompressible  tube  has  been  made  and  compared  with  the  exact  solution  as  well 
as  the  simulated  results  for  residual  stresses  in  an  incompressible  tube. 

2.  INCOMPRESSIBLE  TUBE*  For  an  ideally-plas tic  incompressible  tube 
which  obeys  the  Mises’  yield  criterion  and  the  associated  flow  rules,  a  closed 
form  solution  exists  in  the  plane  strain  case*  The  residual  stresses  and 

partially  autof rettaged 
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displacement  after  complete  elastic  unloading  in  a 
tube  are  given  by  [5], 
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a  <  r  <  p 


(3) 

p  <  r  <  b 

u/r  =  (/3/2)(o0/E)(p/r)2  (4) 

where 

Pi  =  (1  -  P2/b2  +  2  log  p/a)/(b2/a2  -  1)  (5) 

and  p  is  the  radius  of  the  autof rettaged  interface. 

According  to  Hussain  et  al  [6],  the  distribution  of  radial  and  hoop 
stresses  can  be  simulated  by  a  steady  state  thermal  loading.  The  equivalence 
between  the  temperature  gradient  and  the  yield  stress  is 
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and  the  temperature  distribution  is  given  by 
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log( p/a) 


log  (r/a) 


T  =  Tr 


(6) 


(7) 


3.  FINITE  DIFFERENCE  APPROACH.  For  a  compressible  material  with  or 
without  strain  hardening,  a  new  finite  difference  approach  has  been  developed 
by  this  author  [7].  An  incremental  procedure  is  used  for  pressure  beyond 

the  elastic  limit  and  the  elastic  solution  is  used  as  the  initial  condition. 

The  cross  section  of  the  tube  is  divided  into  n  rings  and  we  want  to  determine 

all  incremental  quantities  at  all  grid  points  in  each  incremental  step.  In 

the  plastic  region,  the  incremental  stresses  are  related  to  the  incremental 
strains  by  the  incremental  form 

Aai  =  dij  Aej  for  i>j  “  (8) 

and 

dlj/2G  =  v/(l-2v)  +  6 i j  -  Oi'oj'/S  (9) 

where  E  is  Young’s  modulus,  v  is  Poisson's  ratio,  is  the  Kronecker  delta. 
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,  2G  -  E/(  1+v) 


2  1 

S  =  -  (1  +  -  H'/G)o2 
3  3 

°m  =  (or+a0+oz)/3  ,  Oi'  -  ot  -  am 

o  =  (l//2)[(ar-O0)2  +  (ao-az)2  +  (0z“or)2llyr2  >  ao  (10) 

and  o0  is  the  yield  stress  In  simple  tension  or  compression*  For  a  strain 
hardening  material,  H1  is  the  slope  of  the  effective  stress/plastic  strain 
curve.  For  an  ideally-plastic  material,  H1  -  0.  When  a  <  aQ  or  do  <  0,  the 
state  of  stress  is  elastic  and  the  third  term  in  equation  (9)  disappears. 

Using  equation  (8)  and  Au  -  rAe@,  there  exists  only  two  unknowns  at  each 
station  that  have  to  be  determined  for  each  increment  of  loading.  The  unknown 
variables  in  the  present  formulation  are  (Acq)^,  (Ae^^,  for  i  -  1 ,2 , . * *n,n+l • 

The  equation  of  equilibrium  and  the  equation  of  compatibility  are  valid 
for  both  the  elastic  and  the  plastic  regions  of  a  thick-walled  tube*  The 
finite-difference  forms  of  these  two  equations  at  i  -  l,*..,n  are  given  by 

(ri+1-2r1)(Aor)i  -  (ri+1-r1)(Aoe)-L  +-  ri(Aor)i+1 

*  (ri+l~riKa0*or)i  -  riK<Jr)i+l  *  (Or)il  (U) 

for  the  equation  of  equilibrium,  and 

(r1+1-2ri)(Ae0)i  -  (rn-j-r) (Aer)i  +  ri(Ae0)i+i 

3  (ri+l-rl)(er“e0)i  "  ri[(e0>i+l  “  ( e 0>i ]  (12) 

for  the  equation  of  compatibility. 

With  the  aid  of  the  incremental  stress-strain  relations  (equation  (8)), 
equation  (11)  can  be  rewritten  as 

[(ri+i-2ri)(di2)i  +  (-ri+1+ri)(d22)i]<Ae0)i 

+  .[(ri+l“2ri)(dii)i  +  (-ri+l+ri)(d2i)i](^er)i 
+  ri(di2)i+l(&e0)i+l  +  ri(dii)i+i(Aer)1+i 
=  (ri+l-ri)(o0-or)  -  rit(or)i+i  -  (or)il  (13) 

The  boundary  conditions  for  the  problem  are 

Aor(a,t)  “  -Ap  ,  Aor(b,t)  =  0  (14) 

Using  the  Incremental  relations  (equation  (8)),  we  rewrite  equation  (11)  as 

(di2)l(Ae0)l  +  (dli)i(Aer)i  =  -Ap  (13) 
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and 


(dl2)n+l(Ae:e)n+l  +  (dll)n+l(Aer)n+l  *  0  06) 

Now  we  can  form  a  system  of  2(n+l)  equations  for  solving  2(n+l)  unknowns, 

C Ae q  ) i ,  (Aer)-j_,  for  i  =  1 , 2 , . . . ,n,n+l *  Equations  (15)  and  ( L 6 )  are  taken  as 
the  first  and  last  equations,  respectively,  and  the  other  2n  equations  are  set 
up  at  i  —  1 ,2, . * . ,n  using  equations  (12)  and  (13)*  The  final  system  is  an 
unsymmetric  band  matrix  with  the  nonzero  terms  clustered  about  the  main 
diagonal,  two  below  and  one  above# 

When  the  total  applied  pressure  p  is  given,  it  is  natural  to  divide  the 
loading  path  into  in  equal  fixed  increments  with  Ap  =  (p-p*)/m  where  p*  is  the 
pressure  corresponding  to  initial  yielding*  These  fixed  Increments  need  not 
be  equal  for  all  steps  and  any  sequence  of  m  Increments  can  be  supplied  by  the 
user.  In  [7] ,  an  adaptive  algorithm  to  generate  a  sequence  of  load  increments 
was  described# 

4.  NUMERICAL  RESULTS  AND  DISCUSSIONS#  In  order  to  test  the  accuracy  of 
the  computer  program,  a  convergence  study  for  a  nearly  incompressible  tube  (v 
=  .4999999)  has  been  made  and  compared  with  the  exact  solution  for  an 
incompressible  tube  (v  -  1/2).  The  numerical  results  for  a  tube  with  b/a  -  2 
and  H1  =  0  are  very  accurate  as  shown  in  Table  1  for  30,  60,  and  100  percent 
overstrain-  A  comparison  of  the  calculated  residual  hoop  stresses  with  the 
exact  solution  as  well  as  the  simulated  results  is  shown  in  Table  2.  The 
finite  difference  approach  can  generate  more  accurate  results  than  the  method 
of  simulation  by  thermal  load  for  incompressible  material#  In  order  to 
discuss  the  effect  of  compressibility,  we  calculate  the  residual  stresses  for 
a  tube  with  b/a  -  2,  H1  -  0,  n  =  400,  v  =  0,  0*3,  0.4999#  The  results  are 
shown  in  Tables  3,  4,  and  5  for  residual  hoop,  radial,  and  axial  components, 
respectively.  The  effect  of  hardening  on  the  residual  stresses  can  be 
discussed  in  a  similar  way#  The  results  for  a  tube  with  b/a  =  2,  v  -  0.3,  n  = 
400,  Hf/E  «  0,  1/9,  i/19  (w  =  Et/E  =  0,  0-05,  0*0)  are  shown  in  Tables  6,  7, 
and  8  for  residual  hoop,  radial,  and  axial  components,  respectively.  It  can 
be  seen  that  the  effect  of  hardening  on  residual  hoop  stress  is  larger  than 
that  of  compressibility. 
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TABLE  1.  CONVERGENCE  STUDY  FOR  A  NEARLY  INCOMPRESSIBLE  TUBE  UNDER 
INTERNAL  PRESSURE  (b/a  -  2,  H’  -  0,  V  -  .4999999) 


r 

i 

i 

i 

O.S. 

T 

1 

1 

1 

I 

n 

“T 

1 

1 

1 

1 

- r 

P/0o  1 

- r 

1 

MAX  | 

aQ/a0  | 

Inside 

aJao 

1 

j  E  Ua 

1  °o  a 

1 

I 

1 

1 

1 

1 

r 

i 

30% 

~T 

1 

10 

~T 

1 

.64630  | 

.80697  | 

-.06895 

1 

I  1.54781 

T 

1 

i 

1 

20 

1 

.64099  1 

.81444  | 

-.06364 

I  1.50104 

1 

i 

1 

50 

.63815  | 

.81861  | 

-.05080 

!  1.47764 

1 

i 

1 

100 

1 

.63725  | 

.81996  | 

-.05990 

I  1.47047 

I 

i 

1 

200 

1 

.63681  | 

.82062  j 

-.05946 

1  1.46699 

1 

i 

1 

400 

1 

.63659  | 

.82095  1 

-.05924 

1  1.46528 

1 

i 

I 

1 

1 

* 

1 

1 

.63637  | 

.82128  | 

1 

-.05902 

I  1.46358 

1 

1 

i 

1 

i 

60% 

10 

1 

1 

.77375  I 

1 

.93345  I 

-.19640 

1 

!  2.49329 

1 

1 

i 

1 

20 

.76123  j 

.94049  1 

-.18388 

I  2.33897 

I 

i 

1 

50 

.75464  | 

.94438  1 

-.17729 

I  2.26259 

1 

i 

1 

100 

.75257  | 

.94563  | 

-.17522 

I  2.23922 

1 

i 

1 

200 

1 

.75156  | 

.94625  1 

-.17421 

j  2.22805 

1 

i 

1 

400 

1 

.75105  | 

.94655  1 

-.17371 

I  2.22251 

1 

i 

l 

1 

* 

1 

1 

.75056  | 

.94685  1 

j 

-.17321 

1  2.21703 

1 

1 

i 

! 

i 

100% 

1 

10 

1 

1 

.82096  1 

1.15470  | 

-.24361 

1 

I  4.14111 

I 

1 

i 

1 

20 

1 

.80999  j 

1.15470  j 

-.23264 

I  3.76669 

1 

i 

I 

50 

1 

.80408  | 

1.15470  | 

-.22673 

!  3.57791 

1 

i 

1 

100 

1 

.80221  | 

1.15470  | 

-.22486 

|  3.51990 

1 

i 

1 

200 

1 

.80129  1 

1.15470  1 

-.22394 

I  3.49173 

1 

i 

1 

400 

1 

.80083  | 

1.15470  I 

-.22348 

!  3.47785 

1 

i 

i 

1 

1 

* 

1 

1 

.80038  I 

1.15470  I 

-.22303 

1  3.46410 

1 

1 

*  Exact  solution. 


355 


TABLE  2.  A  COMPARISON  OF  RESIDUAL  HOOP  STRESS  (ad/a0)  FOR  b/a  =  2,  H’ 


r 

i 

o.s. 

1 

1 

1 

r/ a 

1  r 

1  v  =  .5  1 

!  Exact  I 

i  I 

n  =  400 
v  *  .4999 

v  *  .3000 
Simulation 

T 

1 

1 

1 

r 

i 

30% 

~r 

1.0 

I  -0.54224  | 

-0.54317 

-0.54645 

T 

1 

i 

i 

1.1 

|  -  .28497  | 

-  .28582 

-  .29157 

1 

i 

i 

1.2 

I  -  .07250  | 

-  .07329 

-  .08021 

1 

i 

i 

1.3 

j  +  .10709  | 

+  .10636 

+  .09897 

1 

i 

i 

1.4 

I  +  .09672  | 

+  .09587 

+  .08962 

1 

i 

i 

1.5 

1  +  .08835  | 

+  .08774 

+  .08205 

1 

i 

i 

1.6 

I  +  .8150  | 

+  .08056 

+  .07582 

1 

i 

1.7 

1  +  .07583  | 

+  .07487 

4-  .07065 

1 

i 

i 

1.8 

I  +  .07107  | 

+  .07102 

+  .06630 

1 

i 

i 

1.9 

|  +  .06705  | 

+  .06610 

+  .06261 

1 

i 

i 

i 

2.0 

I  +  .06361  | 

1  l 

+  .06267 

+  .05945 

1 

1 

i 

60% 

1 

i 

1.0 

!  1 

I  -0.84679  | 

-0.84865 

-0.85480 

I 

1 

i 

i 

1.1 

I  -  .56305  | 

-  .56468 

-  .57384 

1 

i 

i 

1.2 

I  -  .33048  | 

-  .33191 

-  .34250 

1 

i 

i 

1.3 

I  -  .13525  1 

-  .13652 

-  .14766 

1 

i 

i 

1.4 

j  +  .03190  | 

+  .03076 

+  .01955 

1 

i 

i 

1.5 

1  +  .17737  | 

+  .17635 

+  .16534 

1 

i 

i 

1.6 

j  +  .30575  | 

+  .30483 

+  .29416 

1 

i 

i 

1.7 

I  +  .28446  | 

+  .28345 

+  .21408 

1 

i 

i 

1.8 

I  +  .26662  | 

+  .26555 

+  .25270 

1 

i 

i 

1.9 

I  +  .25152  | 

+  .25042 

+  .24288 

1 

i 

i 

I 

2.0 

I  +  .23863  j 

1  1 

+  .23752 

+  .23062 

1 

1 

i 

100% 

1 

i 

1.0 

1  1 

I  -0.97964  | 

-0.98130 

-0.99326 

1 

1 

i 

i 

1.1 

|  -  .68437  | 

-  .68579 

-  .70058 

1 

i 

i 

1.2 

I  -  .44303  1 

-  .44425 

-  .46027 

1 

i 

i 

1.3 

I  -  .24098  | 

-  .24203 

-  .25841 

1 

i 

i 

1.4 

I  -  .06842  | 

-  .06933 

-  .08559 

1 

i 

i 

1.5 

I  +  .08142  j 

+  .08063 

+  .06474 

1 

i 

i 

1.6 

j  +  .21338  | 

+  .21268 

+  .19729 

1 

i 

1.7 

I  +  .33099  | 

+  .33037 

+  .31553 

1 

i 

i 

1.8 

1  +  .43681  I 

+  .43634 

+  .42205 

1 

i 

i 

1.9 

1  +  .53306  I 

+  .53259 

+  .51886 

1 

i 

i 

i 

i 

2.0 

I  +  .62111  | 

1  I 

+  .62069 

+  .60749 

! 

1 
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TABLE  3.  THE  EFFECT  OF  COMPRESSIBILITY  ON  THE  RESIDUAL  STRESS  o0/ao 

(b/a  =  2,  H'  =  0,  n  =  400) 


r 

i 

O.S. 

~r 

i 

i 

r/a 

1  1 

1  1 

|  V  =  .4999  | 

I  1 

v  =  .3000  I 

v  =  .0000 

T 

1 

1 

1 

r 

i 

30% 

i 

1.0 

1  I 

1  -0.54317  | 

-0.53992  | 

-0.51455 

T 

1 

i 

1.1 

I  -  .28528  | 

-  .28233  I 

-  .25808 

1 

i 

i 

1.2 

I  -  .07329  | 

-  .07127  1 

-  .05712 

1 

i 

i 

1.3 

1  +  .10636  | 

+  *  1 6389  | 

+  .09593 

1 

i 

i 

1.4 

I  +  .09587  | 

+  .09358  ) 

+  .08647 

1 

i 

i 

1.5 

1  +  .08774  I 

+  .08530  1 

+  .07887 

1 

i 

i 

1  *  6 

1  +  .08056  | 

+  .07854  | 

+  .07266 

1 

i 

i 

1.7 

|  +  .07487  | 

+  .07297  | 

+  .06753 

1 

i 

1.8 

|  +  .07012  | 

+  .06831  | 

+  .06324 

1 

i 

i 

1.9 

I  +  .06610  | 

+  .06437  | 

+  .05962 

1 

i 

1 

1 

2.0 

|  +  .06267  | 

1  1 

+  .06102  1 

+  .05653 

1 

i 

1 

i 

60% 

I 

i 

1.0 

1  1 

1  -0.84865  | 

-0.84138  | 

-0.80090 

1 

1 

i 

1.1 

|  -  .56468  | 

-  .55776  1 

-  .51977 

1 

i 

i 

1.2 

1  -  .33191  1 

-  .32513  1 

-  .28850 

1 

i 

i 

1.3 

|  -  .13652  1 

-  .13036  1 

-  .09892 

1 

i 

i 

1.4 

!  +  .03076  | 

+  .03487  | 

+  .05298 

1 

i 

1.5 

1  +  .17635  1 

+  .17160  1 

+  .17167 

1 

i 

i 

1.6 

I  +  .30483  | 

+  .29721  1 

+  .26278 

1 

i 

1.7 

I  +  .28345  | 

+  .27635  | 

+  .24434 

1 

i 

i 

1.8 

I  +  .26555  1 

+  .25889  | 

+  .22892 

1 

i 

1.9 

I  +  .25042  | 

+  .24414  j 

+  .21587 

1 

i 

2.0 

I  +  .23752  | 

1  1 

+  .23155  1 

+  .20474 

1 

1 

1 

i 

100% 

1 

i 

1.0 

1  1 

1  -0.98130  I 

-0.97388  1 

-0.92931 

1 

1 

i 

i 

1.1 

I  -  .68579  | 

-  .67902  j 

-  .63864 

1 

i 

i 

1.2 

|  -  .44425  | 

-  .43792  i 

-  .40015 

I 

i 

i 

1.3 

|  -  .24203  | 

-  .23600  1 

-  .20018 

1 

i 

1.4 

I  -  .06933  | 

-  .06370  1 

-  .03171 

1 

i 

i 

1.5 

1  +  .08063  | 

+  .08531  1 

+  .10837 

1 

i 

1.6 

I  +  .21268  I 

+  .21530  1 

+  .22222 

1 

i 

1.7 

|  +  .33037  | 

+  .32918  1 

+  .31266 

1 

i 

i 

1.8 

I  +  .43634  | 

+  .42900  1 

+  .38327 

1 

i 

i 

1.9 

I  +  .53259  1 

+  .51654  | 

+  .43768 

1 

i 

i 

i 

i 

2.0 

|  +  .62069  | 

1  1 

+  .59296  1 

+  .47918 

1 

1 
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TABLE  4.  THE  EFFECT  OF  COMPRESSIBILITY  ON  THE  RESIDUAL  STRESS  ar/a0 

(b/a  -  2,  H'  =  0,  n  =  400) 


r 

! 

I 

1 

o.s. 

1 

1 

1 

r/a 

1  1 

1  1 

|  v  =»  .4999  | 

1  1 

1 

v  =  .3000  1 

I 

v  =  .0000 

7 

1 

1 

1 

r 

i 

30% 

1 

1.0 

1  ! 

|  0.00000  1 

T 

0.00000  1 

0.00000 

7 

1 

i 

1 

1.1 

I  -  .03732  | 

-  .03684  1 

-  .25808 

! 

i 

1 

1.2 

|  -  .04891  | 

-  .04812  1 

-  .04462 

1 

i 

1 

1.3 

I  -  .04368  j 

-  .04287  | 

-  .03940 

1 

i 

1 

1.4 

I  -  .03320  | 

-  .03255  1 

-  .02994 

1 

i 

1.5 

|  -  .02477  | 

-  .02427  1 

-  .02234 

1 

i 

1 

1.6 

I  -  .01789  | 

-  .01752  1 

-  .01613 

1 

i 

1 

1.7 

|  -  .01220  | 

-  .01194  1 

-  .01100 

1 

i 

1.8 

|  -  .00744  | 

-  .00728  1 

-  .00671 

1 

1 

1.9 

I  -  .00342  | 

-  .00335  1 

-  .00309 

1 

i 

i 

1 

1 

2.0 

|  0.00000  | 

1  1 

0.00000  1 

1 

0.00000 

1 

1 

i 

i 

60% 

1 

1 

1.0 

1  1 

I  0.00000  | 

1 

0.00000  1 

0.00000 

1 

i 

1 

1.1 

1  -  .06371  | 

-  .66302  1 

-  .05957 

1 

i 

1 

1.2 

I  -  .09539  | 

-  .09415  | 

-  .08795 

1 

i 

1 

1.3 

I  -  .10581  | 

-  .10413  1 

-  .09578 

1 

i 

1 

1.4 

j  -  .10182  j 

-  .09986  1 

-  .09032 

1 

i 

1 

1.5 

|  -  .08797  | 

-  .08598  1 

-  .07658 

1 

i 

1 

1.6 

1  -  .06731  | 

-  .06566  1 

-  .05803 

1 

i 

1 

1.7 

1  -  .04593  | 

-  .04480  1 

-  .03960 

1 

i 

1 

1.8 

I  -  .02804  j 

-  .02735  1 

-  .02417 

1 

i 

1 

1.9 

I  -  .01291  j 

-  .01259  1 

-  .01113 

1 

i 

t 

1 

1 

2.0 

|  0.00000  I 

1  1 

0.00000  1 
| 

0.00000 

1 

1 

1 

i 

100% 

1 

1 

1.0 

1  1 

I  0.00000  | 

1 

0.00000  1 

0.00000 

1 

1 

i 

1 

1.1 

1  -  .07520  | 

-  .07453  1 

-  .07076 

1 

i 

! 

1.2 

I  -  .11561  | 

-  .11444  ! 

-  .10780 

1 

i 

1 

1.3 

1  -  .13282  | 

-  .13125  1 

-  .12233 

1 

i 

1 

1.4 

1  -  .13424  | 

-  .13235  1 

-  .12165 

1 

i 

1 

1.5 

1  -  .12474  | 

-  .12262  ! 

-  .11079 

1 

i 

1.6 

I  -  .10764  | 

-  .10541  1 

-  .09335 

1 

i 

1 

1.7 

I  -  .08523  j 

-  .08307  j 

-  .07197 

1 

i 

1 

1.8 

I  -  .05911  | 

-  .05728  | 

-  .04850 

1 

1 

1.9 

1  -  .03042  | 

-  .02929  1 

-  .02423 

1 

i 

i 

1 

1 

2.0 

I  0.00000  | 

1  1 

0.00000  I 

1 

0.00000 

1 

1 
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TABLE  5.  THE  EFFECT  OF  COMPRESSIBILITY  ON  THE  RESIDUAL  STRESS  oz/oQ 

(b/a  «  2,  H'  =  0,  n  *  400) 


r 

i 

i 

i 

o.s. 

T~ 

1 

I 

1 

r/a 

1  r 

1  1 

|  v  -  .4999  | 

v  *=  .3000  j 

v  -  .0000 

I 

1 

1 

1 

r 

i 

30% 

1 

1 

1.0 

1  1 
|  -0.27153  | 

-0.15819  | 

+0.01264 

T 

1 

i 

1 

1.1 

I  -  .16153  | 

-  .08015  | 

+  .03964 

1 

i 

1 

1.2 

I  -  .06108  j 

-  .02272  1 

+  .02990 

1 

i 

1 

1.3 

|  +  .03133  | 

+  .01831  | 

+  .00000 

I 

i 

1 

1.4 

j  +  .03133  | 

+  .01831  1 

+  .00000 

1 

i 

1 

1.5 

t  +  .03133  | 

+  .01831  | 

+  .00000 

1 

i 

1 

1.6 

1  +  .03133  | 

+  .01831  | 

+  .00000 

1 

i 

1 

1.7 

j  +  .03133  | 

+  .01831  | 

+  .00000 

1 

1 

1.8 

j  +  .03133  1 

+  .01831  1 

+  .00000 

1 

i 

1 

1.9 

1  +  .03133  1 

+  .01831  1 

+  .00000 

i 

i 

I 

2.0 

I  +  .03133  | 

1  1 

+  .01831  | 

+  .00000 

1 

i 

1 

i 

60% 

1 

1 

1.0 

I  1 

I  -0.42426  | 

-0.28532  | 

-0.07295 

1 

1 

i 

1 

1.1 

1  -  .31413  | 

-  .18422  1 

+  .01377 

1 

i 

1 

1.2 

I  -  .21360  | 

-  .10266  1 

+  .06134 

i 

i 

I 

1.3 

j  -  .12112  | 

-  .03886  1 

+  .07385 

1 

i 

1 

1.4 

1  -  .03551  I 

+  .00946  | 

+  .06108 

1 

i 

1 

1.5 

1  +  .04419  | 

+  .04477  | 

+  .03373 

1 

i 

1 

1.6 

I  +  .11874  I 

+  .06946  | 

+0.00000 

1 

i 

1 

1.7 

I  +  .11874  | 

+  .06946  j 

+0.00000 

1 

i 

1 

1.8 

I  +  .11874  | 

+  .06946  I 

+0.00000 

1 

i 

1 

1.9 

I  +  .11874  | 

+  .06946  1 

+0.00000 

1 

i 

1 

1 

| 

2.0 

1  .+  .11874  | 

|  1 

+  .06946  | 

+0.00000 

1 

1 

l 

i 

100% 

1 

1 

1.0 

I  | 

|  -0.49052  I 

-0.36683  \ 

-0.16290 

1 

1 

i 

1 

1.1 

I  -  .38037  | 

-  .25538  I 

-  .05105 

1 

i 

1 

1.2 

!  -  .27984  | 
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TABLE  6.  THE  EFFECT  OF  HARDENING  ON  THE  RESIDUAL  STRESS  0Q/oo 

(b/a  -  2,  v  -.3,  n  -  400) 
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TABLE 

7. 

THE  EFFECT  OF  HARDENING 
(b/a  -  2,  v  =.3 

ON  THE  RESIDUAL 
,  n  =  400) 
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TABLE  8.  THE  EFFECT  OF  HARDENING  ON  THE  RESIDUAL  STRESS  0z/ao 

(b/a  =  2,  v  =.3,  n  »  400) 
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DYNAMIC  GUN  TUBE  BENDING  ANALYSIS 
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ABSTRACT.  A  simulation  is  presented  of  a  gun  barrel  and  its  support  at 
the  trunnion.  The  simulation  was  programmed  on  an  EAI  781  hybrid  computer, 
from  a  magnetic  field  test  tape.  Errors  due  to  dynamic  gun  tube  bending  are 
presented. 

OBJECTIVE.  Our  objective  is  to  evaluate  the  error  due  to  gun  tube  flexure 
introduced  from  vehicle  motions  while  f iring-on-the-move.  The  analysis  done 
will  be  applicable  to  the  dynamic  bending  of  any  beam-like  structure. 

INTRODUCTION.  In  recent  years  there  has  been  an  increased  emphasis  on 
firing  a  combat  vehicle's  main  weapon  while  the  vehicle  was  moving.  This  has 
been  called  "f iring-on-the-move"  (FOM).  Stabilization  systems  were  added  to 
vehicles  that  were  designed  to  perform  accurate  stationary  firing  with  the  idea 
that  stabilizing  the  gun  in  elevation  and  azimuth  would  allow  the  vehicle  to 
perform  accurate  firing  while  moving.  However,  this  was  not  the  case.  Errors 
occurred  while  f iring-on-the-move  that  are  not  significant  when  firing  from  a 
stationary  vehicle.  Some  of  these  errors  are  the  horizontal  and  vertical 
vehicle  velocities,  stabilization  errors,  combined  pitching  and  rolling  motions, 
and  gun  tube  flexure.  This  report  is  concerned  with  evaluating  the  error  due 
to  gun  tube  flexures  that  are  introduced  from  vehicle  motions. 

A  gun  tube  can  bend  or  take  non-uniform  shape  due  to  disturbances  or 
phenomena  that  are  not  vehicle  introduced.  These  can  be  caused  from  firing 
the  gun  or  from  sunlight  heating  one  side  of  the  gun  tube.  These  errors  are 
not  included  in  this  simulation.  The  static  or  quasi-static  error  caused  from 
thermal  gradients  in  the  tube  is  corrected  for  in  current  vehicles  with  a 
muzzle  reference  system.  This  system  has  a  small  mirror  mounted  on  the  muzzle 
end  of  the  tube.  A  light  beam  is  reflected  off  the  mirror  to  align  the  sight 
with  the  tube  muzzle.  This  system  performs  very  well  for  these  quasi-static 
corrections  but  cannot  be  used  for  dynamic  tube  leveling  on  the  moving  vehicle. 

It  is  extremely  difficult  to  measure  the  dynamic  bending  of  a  gun  tube  in 
a  vehicle  traversing  cross-country  terrain.  A  one-mil  angular  bending  error  in 
a  tube  will  produce  approximately  a  five-foot  error  firing  at  a  target  1600 
meters  away.  This  is  a  significant  error  and  one  must  measure  the  tube  bending 
to  considerably  less  than  one  mil.  To  give  some  indication  of  the  angular 
size  this  corresponds,  i.e.,  the  angle  a  golf  ball  subtends  a  football  field 
away  is  about  0.3  mils. 

The  derivation  of  the  equations  that  were  programmed  on  the  computer  is 
given  in  Appendix  A.  The  equations  and  computer  programs  are  in  a  general  form 
and  are  applicable  to  any  symmetrical  gun  tube.  Realistic  dimensions  and 
material  data  were  chosen  for  the  analysis.  The  gun  tube  was  separated  into 
eighteen  uniform  elements,  with  each  finite  element  having  uniform  character¬ 
istics  over  its  length.  One  thing  to  note  in  the  equations  is  that  the  gun 
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tube  rigidity  increases  as  the  fourth  power  of  the  diameter.  Thus,  larger 
caliber  gun  tubes  are  considerably  more  rigid  that  small  ones. 

The  model  was  implemented  and  solved  on  a  hybrid  computer.  The  gun  was 
modeled  on  an  analog  computer  and  forcing  functions  were  supplied  by  the 
digital  computer  via  D/A.  The  vehicle  ride  was  obtained  from  magnetic  tape 
recordings  of  field  data.  These  rides  were  digitized  and  stored  in  the  digital 
computer  for  use  as  the  gun  forcing  functions.  The  input  into  the  gun  was  only 
in  the  vertical  direction;  consequently,  the  error  data  presented  are  for  the 
gun  tube  flexure  in  a  vertical  plane.  In  reality,  there  is  some  flexing  in  the 
horizontal  direction  but  that  is  not  considered  here. 

DISCUSSION.  The  purpose  of  this  study  was  to  measure  by  computer  tech¬ 
niques  the  muzzle  error  at  a  mile  range  of  a  gun  barrel  subjected  to  dynamic 
inputs  at  the  trunnion.  To  simulate  the  gun  tube,  it  was  divided  into  sections 
to  analyze  its  response  using  Euler's  equation  for  the  flexure  of  a  beam. 

The  equations  of  motion  as  applied  to  the  sectioned  tube  are  as  follows: 

1.  Basic  equation  for  gun  barrel  without  support: 

2<EI>L  /  V  (EI)L+1  ,  . 

mlyl  -  -73—  *  (yl+i  -  2Vl  +  yl-i)  -  -7TT1  *  (vw  -  2Yui  +  \) 

\  Vl+1 


(El),  ,  ,  . 

-  77^  <YL  -  2Vl-l  *  \-d 

Vl-i 

WHERE:  L  =  Subscript  to  designate  the  section 
M  =  Mass 

E  =  Modulus  of  elasticity 
I  =  Moment  of  inertia 
X  =  Length 

Y  =  Vertical  Displacement 

Y  =  Vertical  acceleration 

2.  Basic  equations  for  gun  barrel  with  support  acting  on  1st,  2nd,  and  11th 
sections: 

a.  1st  Section 

..  (EI).  ,  . 

¥l  -- tKY3  -  2V2  +  Yl)  -KS*Y1 
12 

WHERE:  Ks  =  Spring  constant  of  support  (12,200  lbs/in) 
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b.  2nd  Section 


(El) 


X2X3 


1  , 


(Y,  -  2Y3  +  Y2)  -  Ks  *  V2 


c.  11th  Section 


M11Y11 


2(EI) 


11 


(Y 


12 


2Y11  +  V 


(El) 


12 


11 


X 1 1 X 1 2 


2  *  (Y13  "  2Y12  +  Yll) 


(Enin 

— ~*(Y11  -  2yio  +  Yg)  -  Ks  *  Yn 

xirio 


NOTE :  A  detailed  description  in  the  development  of  the  equations  of  motion 
is  noted  in  Appendix  A. 

The  equations  of  motion  were  simulated  on  the  analog  portion  of  the  hybrid 
computer.  A  typical  analog  circuit  that  generates  sections  1,  2,  and  3  is 
shown  in  Figure  2. 

The  muzzle  error  due  to  the  flexure  of  the  gun  tube  has  two  components, 
one  based  on  the  bending  displacement  and  one  based  on  the  rate  of  change  of 
that  bending.  We  refer  to  these  as  angular  error  and  velocity  error  and  their 
sum  as  total  error.  If  the  tube  were  completely  rigid,  this  error  would  be 
zero.  Bending  from  gravity  occurs,  but  since  the  error  from  this  is  well-known 
and  compensated  for,  it  is  removed  prior  to  a  simulation  run. 

At  the  start  of  a  simulation  run,  the  static  error  due  to  analog  noise 
was  measured  and  removed.  The  model  was  run  100  times  slower  than  real  time 
and  20  sample  measurements  of  the  error  each  second  were  taken  to  avoid  inter¬ 
ference  from  the  natural  frequency  of  the  tube,  which  was  approximately  500  Hz. 
Seven  and  one-half  seconds  of  each  ride  was  studied  to  obtain  a  representative 
sampling  of  the  error.  The  vertical  displacements  of  the  trunnion  were  inputted 
dynamically,  and  the  resulting  error  measurements  saved  in  computer  storage  for 
processing  after  the  run.  Refer  to  Appendix  B  for  details  on  the  trunnion 
inputs . 

Six  different  vehicle  rides  were  studied,  each  with  and  without  the  addi¬ 
tional  support.  For  each  of  the  types  of  error  collected,  distributions  were 
determined  with  regard  to  the  gun  aiming  at  a  target  1600  meters  distant.  The 
range  of  error  was  divided  into  classes  and  histograms  of  the  frequency  that 
the  error  fell  into  each  class  were  made.  Time  histories  of  the  total  error 
were  also  plotted. 
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Hit  probability  curves  were  generated  based  on  each  type  of  error.  For 
ten  selected  target  sizes  the  percentage  of  hits  given,  the  measured  errors 
were  calculated.  A  smooth  curve  was  fit  through  the  ten  target  size  points. 
Since  an  enemy  tank  would  be  approximately  2.5  meters  high,  hit  probabilities 
for  this  particular  target  size  are  displayed  in  Figure  3. 

A  major  concern  was  the  relative  contribution  of  the  velocity  error,  as  a 
compensating  system  for  this  does  not  yet  exist.  For  all  the  rides  studied, 
the  velocity  error  averaged  3.2  percent  of  the  total  error  without  the  support 
and  15.6  percent  with  the  support.  In  the  latter  case,  the  increase  is  prob¬ 
ably  due  to  the  higher  total  accuracy  of  the  system  with  the  extra  support. 
However,  in  both  cases,  the  contribution  is  minor.  These  results  are  displayed 
in  Figure  4. 

By  referring  to  Figure  3,  the  effect  of  the  additional  support  can  be 
easily  seen.  For  the  2.5  meter  target,  hit  probabi lity  increased  from  an  average 
of  12.9  percent  to  an  average  of  79.3  percent.  This  large  improvement  in 
performance  shows  that  if  firing  on  the  move  is  desired,  additional  rigidity 
of  the  gun  barrel  will  greatly  reduce  the  error  caused  by  the  dynamic  motion  of 
the  vehicle. 

CONCLUSIONS.  To  perform  accurate  firing  on  the  move,  the  gun  tube  flexure 
due  to  vehicle  motion  must  be  considered. 

For  the  rides  and  gun  used  in  this  simulation,  traversing  Course  4  at 
7  mph  resulted  in  the  gun  being  on  a  2.5  meter  target  1600  meters  away  less 
than  10  percent  of  the  time.  This  error  was  due  only  to  gun  tube  bending--the 
sight  and  breech  end  of  the  gun  were  pointing  at  the  center  of  the  target. 

Providing  a  rigid  support  for  the  gun  tube  resulted  in  an  increase  in  the 
hit  probability  for  the  "bending"  condition  of  a  factor  greater  than  7. 

Providing  a  rigid  support  for  a  gun  tube  will  signif icantly  decrease  the 
bending  error. 

The  tube  bending  error  is  due  almost  entirely  to  the  tube's  angular  posi¬ 
tion.  The  error  due  to  muzzle  velocity  was  insignificant. 

For  some  conditions  the  gun  tube  bending  error  can  be  the  most  significant 
error  occurring  while  firing  on  the  move. 
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APPENDIX  A 


Equations  of  Motion  Derivation 


Euler's  equation  for  the  flexure  of  a  beam: 


The  slope  across  an  element  i  is  given  by: 


(A-l) 


dXi  AX^ 

Y-j  is  the  vertical  distance  moved  for  element  i  from  an  arbitrary  reference 
line. 

The  second  derivative  or  rate  of  change  of  slope  is  the  difference  between  the 
left  and  right  faces  of  the  element. 

i  .e. 


d 


(A-3) 
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Where  the  prime  denotes  derivative 


Then : 


d  Yi  Yi-i  -  2Yi  +  Vi 


(A— 4 ) 


The  bending  moment  at  each  element  is  given  by: 


Mi  =  <EI>i£li 


(A~5) 


Then  for  El  constant  over  element  i  the  bending  moment  of  element  i  is  given 
by: 


(EI),  (VM  -  2Yi  *  Yi+1) 


(A-6) 


Euler's  equation  states: 
d2M  w  d2Y 


(A-7) 


dX2  ^  dt2 

To  take  the  derivative  of  the  bending  moment  El  must  be  constant  over  the 
element. 

The  rate  of  change  of  bending  movement  over  the  element  is  given  by: 


dMi  Mi_i  “  Mi 


(A-8) 


The  second  derivative  is  then  given  by: 


A  Mi-i  -  Mui 


(A-9) 


Then: 


tTM.  M._1  -  2Mi  +  Mi+1 


(A-10) 
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Writing  each  moment  equation 

Wh-1  <Yi-2  -  2Yi-l  +  Y1> 


Vi 


( A- 11) 


<1 


Mi+1 


(EI)1  (^1  -  2Yi  =  Yi+1) 


AX? 


(£I)i+l  <Yi  -  2Vl  +  Vi+2> 


X2 

Vi 


(A-12) 


(A— 13) 


The  mass  of  each  element  is  the  mass  per  unit  length  times  the  length  of  the 
element. 


M.  -  -w  i  AXi 

l  9 

Euler's  equation  is  then  written  as: 


(A-14) 


_  (EQj.i  (^i_2  -  zVi  "  vi>  2(E1>i  (Yi-l 
dt2 


2Vi  +  Vl> 


M,  AX?., 

(EI>1+1  <g  -  2vi+1  +  vi+z: 


AX? 


AXi  AXU1 

Evaluating  the  end  conditions: 

There  is  no  bending  moment  on  the  end  element 

d\nd  Mi+1 


dX^ 


AX? 


end  '"i 
Second  from  end 


d2M 


dX 


ejnd+l 
2 

end+1 


AX? 


(A-15) 


(A— 16 ) 


{ A- 1 7 ) 
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The  opposite  end 


and 


d2M 


end-1 


dX 


2 

end-1 


Mi-1  -  2Mi 


( A- 18) 


(A-19) 


APPENDIX  B 

Trunnion  Movement  Generation 


A  14-channel  magnetic  tape  of  analog  field  data  from  tests  conducted  at 
Fort  Knox,  Kentucky  was  used  to  provide  center  of  gravity,  vertical,  pitch, 
and  roll  acceleration  signals.  These  were  combined  to  produce  a  vertical 
trunnion  acceleration  signal  which  was  digitally  sampled  at  100  times  per 
second,  using  a  high-speed  digital-to-analog  converter,  and  stored  for  later 
use. 


The  analog  simulation  was  run  100  times  slower  than  real  time  to  obtain  a 
more  accurate  simulation.  This  also  allowed  us  to  observe  high-frequency  gun 
tube  movements  which  would  have  been  difficult  to  follow  with  the  naked  eye. 

The  acceleration  signal  was  digitally  integrated  twice  to  provide  a  dis¬ 
placement  signal  which  was  applied  to  the  trunnion  during  simulation.  The 
displacement  signal  was  inputted  into  the  feedback  inverter  of  the  supports 
(see  Figure  2),  which  caused  the  displacement  of  the  gun  to  match  the  driving 
signal . 

Figure  5  shows  a  typical  displacement  signal  used  as  input  to  the  trunnion. 
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HI  HAG  COURSE  3,  15  MPH 
TRUNNION  INPUT 


FINITE  ELEMENT  MODELING  OF  THE  VULNERABILITY  OF  AN  M-15  LAND  MINE  USING 

AN  EXPLICIT  INTEGRATION  SCHEME 


Frederick  H.  Gregory 
U.S.  Array  Ballistic  Research  Laboratory 
U.S.  Army  Armament  Research  and  Development  Command 
Aberdeen  Proving  Ground,  Maryland  21005 


ABSTRACT .  A  finite  element  model  of  the  body  of  an  M-15  land  mine  has 
been  formulated  using  an  axisymmetric.  two-dimensional  mesh  with  both  rigid 
and  nonlinear  spring  base  support  boundary  conditions  to  simulate  the  soil. 

This  model  has  been  analyzed  with  the  ADINA  finite  element  structural  response 
code.  An  analysis  of  various  implicit/explicit  time  integration  schemes 
showed  that  the  explicit  central  difference  time  marching  method  gave  the 
best  solution  in  terms  of  displacements  and  stresses.  -A  numerical  study  was 
conducted  to  determine  the  optimum  time  for  which  convergence  of  the  solution 
was  obtained. 

The  two  basic  materials  of  which  the  mine  is  composed,  steel  and  high 
explosive,  were  assumed  to  have  nonlinear  constitutive  material  models.  The 
steel  case  was  found  to  be  markedly  inhomogeneous  via  1-D  tensile  tests  of 
specimens  cut  from  various  areas  of  the  same.  This  material  was  modeled  with 
a  bilinear  stress-strain  curve,  von  Mises  yield  condition,  and  kinematic 
hardening  rule.  A  tension  cut-off  elastic-plastic  model  of  the  explosive 
which  employed  a  bulk  modulus  versus  volume  strain  relation,  was  derived  from 
a  Mie-Gruneisen  shock  wave  equation  of  state.  This  model  allowed  a  tension 
cut-off  plane  to  form  in  a  direction  normal  to  the  principal  tensile  stress 
whenever  the  strain  initially  exceeded  0.1%  in  tension. 

Solution  of  this  problem  out.  to  2  msec  of  real  time  required  about  4  hours 
of  epu  time  on  the  CDC  7600  computer  for  a  transient  shock  load  imposed  on  the 
top  and  sides  of  the  mine.  Failure  of  the  mine  case  was  predicted,  based  on 
a  comparison  of  the  value  of  the  three-dimensional  second  invariant  of  plastic 
strain  with  that  of  the  one-dimensional  value  measured  in  the  tensile  tests. 

1.  INTRODUCTION.  This  paper  describes  the  response  of  an  antitank  mine 
to  a  transient  blast  load.  The  rationale  for  this  analysis  is  the  need  to 
develop  a  remote,  expeditious  means  of  clearing  a  path  through  an  enemy  mine 
field.  A  technique  has  been  suggested  (Ref.  1)  by  which  a  relatively  large 
transient  pressure  is  delivered  to  the  surface  of  the  earth  by  means  of  ex¬ 
plosives.  The  object  of  this  study  was  to  determine  the  extent  of  structural 
damage  to  an  M-15  mine  body  from  a  given  level  of  blast  wave  amplitude  and 
shape.  The  principal  kill  mechanism  is  to  be  a  serious  distortion  or  rupture 
of  the  mine  body.  It  was  not  intended  that  advantage  be  taken  of  some  nuance 
of  component  design  such  as  fuze  initiation  or  pressure  plate  removal,  etc. 

Some  considerations  such  as  the  latter  have  been  examined  in  Ref.  2. 

The  paper  is  divided  into  four  major  areas  as  follows:  (a)  problem 
definition,  (b)  determination  of  material  properties  and  selection  of  failure 
criteria,  (c)  finite  element  model  description  and  calculations,  and  (d) 
analysis  of  predicted  response. 
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2.  PROBLEM  DEFINITION. 


A.  M-15  Antitank  Mine  Description.  The  M-15  mine  has  a  cylindrical 
steel  body  with  a  primary  fuze  well  in  the  center  of  the  top  and  two  secondary 
fuze  wells,  one  on  the  side  and  one  on  the  bottom.  The  center  of  the  top  of 
the  mine  has  a  depressed  area  which  houses  the  pressure  plate  assembly. 

Drawings  of  the  mine  are  shown  in  Figures  1  and  2.  'The  mine  has  a  nominal 
diameter  of  32.13  cm,  height  of  9.88  cm,  and  weighs  14,3  kg. 

The  mine  body  is  made  essentially  of  two  pieces  of  WD-1010  steel  which 
are  joined  at  the  lower  periphery  by  a  360°  crimp.  The  upper  part  of  the 
mine  body  is  formed  by  a  deep  drawing  operation  which  results  in  very  inhomo- 
geneous  materials  properties.  The  central  cavity  shown  in  the  lower  halves  of 
Figures  1  and  2  is  filled  with  10  kilograms  of  composition  B  explosive.  This 
filling  operation  is  done  with  the  explosive  in  a  molten  state. 

The  normal  method  of  activation  of  the  fuze  is  by  means  of  force  applied 
to  the  pressure  plate  (1250  to  2000  newtons)  which  in  turn  is  transferred  to 
the  belleville  springs.  At  a  certain  deflection,  the  belleville  springs  snap 
through,  driving  the  firing  pin  into  the  detonator.  The  explosion  of  the 
detonator  activates  the  tetryl  booster  which  in  turn  detonates  the  primary 
composition  B  charge.  There  are  two  auxiliary  fuze  wells  on  the  M-15  mine 
which  give  it  an  anti-disturbance  capability  (See  Figure  2) . 

B.  Guidelines  for  the  Numerical  Model,  In  keeping  with  the 
philosophy  of  Identifying  a  failure  mechanism  which  is  as  general  as  possible 
and  is  not  dependent  upon  some  specific  design  feature,  the  pressure  plate, 
fuze,  and  belleville  springs  were  omitted  from  the  finite  element  model.  This 
was  done  in  consonance  with  the  previously  stated  guideline  of  not  identifying 
failures  of  the  fuze  components.  The  part  of  the  mine  which  constitutes  our 
model  is  shown  in  the  lower  part  of  Figure  2,  not  including  the  secondary  fuzes 
and  filling  hole. 

There  are  a  large  number  of  antitank  mines,  both  foreign  and  of  U.S. 
manufacture,  which  consist  basically  of  a  round  thin  metal  body  filled  with 
explosive.  This  type  of  antitank  mine  constitutes  a  large  part  of  the 
inventory  of  U.S,  and  Soviet  mines.  The  component  which  is  most  distinctive 
is  the  fuze  mechanism.  There  are  a  variety  of  radically  different  fuzes  for 
these  mines,  different  both  in  mechanical  design  and  different  in  the  selection 
of  some  particular  signature  of  combat  tanks  which  is  required  to  activate  the 
fuze  train.  Therefore,  the  numerical  model  adapted  for  the  M-15  mine  is 
representative  of  a  basic,  necessary  component  of  a  large  class  of  both 
foreign  and  U.S,  mines. 

The  auxiliary  fuze  wells  were  eliminated  from  the  finite  element  model 
for  two  reasons.  First,  these  fuze  wells  make  the  mine  body  more  susceptible 
to  damage  due  to  stress  concentrations  which  occur  in  the  neighborhood  of  the 
joint  between  the  secondary  fuze  and  the  mine  body.  Thus,  the  simplified 
model  is  more  conservative  in  terns  of  the  blast  load  required  for  mine  defeat. 
Secondly,  the  inclusion  of  these  wells  in  the  finite  element  mesh  would  have 
necessitated  the  use  of  a  three-dimensional  (3-D)  finite  element  model.  The 
3-D  model  would  have  required  a  very  large  increase  in  the  amount  of  computing 
time  used  in  obtaining  the  dynamic  response  of  the  structure.  The  four  lobed 
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ARMING  PLUG  IN  PRESSURE 


ARMING  PLUG.  /-FUZE  M603 


Figure  1.  U.S.  M-15  Antitank  Mine 
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SECONDARY  FUZE,  BOTTOM 
(Shown  out  of  position) 


CRIMP 


Figure  2. 


M-15  Mine  Base  Assembly 
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shaped  dimples  in  the  base  of  the  mine  body  shown  in  Figure  2  were  omitted 
for  the  same  reasons.  The  result  of  these  simplifications  was  a  2-D  model 
in  which  we  needed  consider  only  a  pie  shaped  section  of  the  axisymmetric 
body . 


C,  Base  Support  and  Surface  Loading.  When  employed  in  the  field, 
mines  of  the  M-15  type  are  usually  buried  and  covered  with  a  shallow  layer  of 
soil  for  concealment.  In  a  few  cases,  the  mine  may  be  placed  on  the  surface 
and  covered  with  grass,  leaves,  etc.,  for  concealment*  In  either  case,  the 
mine  will  experience  a  transient  pressure  load  on  its  top  surface  when  the 
countermine  explosive  is  detonated  in  the  vicinity.  For  buried  mines,  the 
sides  will  experience  a  lesser  pressure  pulse,  the  magnitude  of  which  will 
depend  on  several  factors  such  as  how  well  the  soil  is  tamped,  type  of  soil, 
and  depth  of  burial.  The  base  of  the  mine  will  pick  up  a  load  at  a  later 
time  from  that  of  the  top  surface  loading.  The  magnitude  and  shape  of  this 
base  support  will  depend  upon  the  downward  acceleration/movement  of  the  mine 
and  the  dynamic  properties  of  the  soil  medium. 

The  method  used  to  simulate  the  soil  support  was  by  means  of  nonlinear, 
upward  acting  springs.  The  detailed  description  will  be  given  in  the  next 
Section.  This  calculation  will  be  referred  to  as  Case  A. 

In  order  to  compare  the  predicted  structural  response  with  experimental 
data  presented  in  Reference  1,  a  second  base  support  condition  was  used  in 
which  the  motion  of  the  base  in  the  vertical  direction  was  restrained.  This 
support  condition  will  be  referred  to  as  Case  B. 

The  boundary  support  configurations  for  Cases  A  and  B  are  shown  in 
Figure  3.  Also  shown  there  are  the  portions  of  the  surface  that  were  loaded. 
In  Case  A,  the  mine  is  simulated  as  being  buried  in  soil  up  to  its  top 
surface;  whereas,  in  Case  B  it  is  assumed  sitting  on  a  rigid  surface. 

Reference  1  describes  some  experiments  conducted  with  mine  clearance 
types  of  explosives.  In  these  experiments,  two  types  of  U.S.  mines  were 
exposed  to  the  resulting  blast  loading.  The  pressure  pulse  used  in  this  paper 
was  designed  to  simulate  the  peak  pressure  and  impulse  measured  in  these 
experiments.  The  peak  pressure  was  13.8  MPa  and  the  impulse  delivered  was 
6.5  kPa-sec.  A  decaying  exponential  function  was  fitted  to  these  parameters 
resulting  in  the  following  equation 

P(t)  =  13.76  e"2117t  (1) 

A  curve  of  this  function  varying  in  time  is  shown  in  Figure  4,  All  points  on 
the  surface  of  the  mine  indicated  in  Figure  3  were  loaded  with  this  transient 
load  beginning  at  zero  seconds. 

3.  MATERIAL  PROPERTIES  AND  FAILURE  CRITERIA.  Material  properties  were 
required  for  the  steel  jacket,  the  composition  B  explosive  filler,  and  the 
soil  in  which  the  mine  is  emplaced.  Of  these  three,  mechanical  properties 
were  measured  only  for  the  steel  jacket.  The  data  for  the  explosive  and  soil 
were  taken  from  available  publications.  Failure  criteria  are  developed  for 
the  steel  jacket  and  the  explosive. 
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Figure  3.  Base  Support  and  Loaded  Surfaces 
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Figure  4.  Shock  Loading  Function  for  M- 1 5  Antitank  Mine 


A.  WD-1010  Steel.  The  M-15  jacket  is  made  of  a  medium  strength 
steel  alloy  with  a  density  of  7.80  g/cm3  and  a  thickness  of  0.94  mm.  Six 
tensile  specimens  were  cut  from  an  inert  training  mine  as  shown  in  Figure  5. 
Two  specimens  were  cut  from  each  of  the  significant  surfaces  of  the  mine  body. 
These  specimens  were  machined  with  a  large  radius  on  the  test  section  as 
shown  in  Figure  5(b).  A  biaxial  strain  gage  was  attached  at  the  location  of 
the  minimum  width.  The  specimens  were  then  tested  in  an  Instron  Testing 
Machine.  The  stress-strain  curves  resulting  are  shown  in  Figures  6-8, 
plotted  in  pairs  according  to  the  location  of  the  specimens  on  the  mine 
surface.  The  stress-strain  curves  for  the  pairs  of  specimens  are  similar 
when  comparing  a  curve  with  that  of  a  mating  specimen,  but  were  surprisingly 
dissimilar  when  compared  with  specimens  taken  from  different  areas  on  the 
mine  body.  Of  the  three  pairs  of  stress-strain  results,  the  data  for  the  top 
annular  surface  showed  the  most  disparity.  The  other  two  sets  of  data 
indicate  very  closely  matched  properties.  It  is  evident  that  the  metal  is 
work  hardened  in  various  areas  by  the  stamping  operation  by  which  the  upper 
part  of  the  mine  jacket  is  shaped. 


As  can  be  seen  in  the  vertical  section  in  Figure  2,  the  jacket  is  formed 
from  two  sheet  steel  blanks,  which  after  being  shaped  are  joined  by  a  360° 
crimp  around  the  lower  periphery  of  the  mine.  The  bottom  of  the  mine  was 
assumed  to  have  the  same  properties  as  the  pressure  plate  well  area  because 
neither  are  deformed  appreciably  in  the  forming  operation.  The  metal  in  the 
pressure  plate  well  area  exhibits  properties  typical  of  mild  steel  (Figure  6). 
There  is  a  slight  overshoot,  of  the  yield  stress,  a  relatively  flat,  low 
modulus  section  followed  by  a  large  strain  to  failure. 

Bilinear  approximations  to  the  stress-strain  curves  are  also  shown  in 
Figures  6-8.  These  bilinear  approximations  were  obtained  by  averaging  the 
data  for  the  individual  specimens.  The  version  of  the  ADINA  (References  3 
and  4)  finite  element  code  used  in  this  analysis  has  a  bilinear,  elastic- 
plastic,  von  Mises  yield  condition,  kinematic  hardening,  axisymmetric  2-D 
element  which  was  used  to  model  the  steel  jacket.  A  summary  of  the  signifi¬ 
cant  parameters  is  given  in  Table  1. 

Because  the  steel  in  the  mine  jacket  has  an  appreciable  amount  of 
ductility,  it  was  desired  to  apply  a  failure  criterion  which  included  a 
measure  of  the  deviatoric  strain.  The  deviatoric  strain  is  defined  by  the 
usual  formula 


a. . 

ij 


e.  . 


e. .  <$.  ,/3 
kk  ij' 


(2) 


where 


G.  . 
ij 


is  the  deviatoric  strain  component 
is  the  total  strain  component 
is  the  Kronecker  delta 


and  repeated  indices  imply  a  summation.  The  strain  is  assumed  composed  of  an 


382 


WELL 

AREA 


(a)  LOCATION  OF  SPECIMENS 


(b)  PREPARATION  OF  SPECIMEN 
DIMENSIONS  (cm) 

Figure  5.  Tensile  Stress-Strain  Specimens  for  the  M-15 
Mine  Base  Assembly 
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PRESSURE  PLATE  WELL  TENSILE  TESTS 


384 


TOP  ANNULAR  TENSILE  TESTS 
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Figure  7.  Stress-Strain  Curves  for  Top  Annular  Tensile  Specimens 
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TABLE  1.  WD-1010-T4  SHEET  STEEL  STRESS-STRAIN  DATA 
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elastic  and  a  plastic  component  such  that 


The  plastic  component  of  the  spherical  strain  is  assumed  zero,  i.e.. 


(3) 


(4) 


Upon  substitution  of  Equation  (3)  into  Equation  (2)  and  using  an  equation 
similar  to  Equation  (3)  for  the  total  deviatoric  strain,  £.. ,  and  further 
using 


GT.  <5.  ./3, 
kk  ij'  * 


and  Equation  (4) ,  one  finds  the  result 


(5) 


The  criterion  which  was  selected  to  predict  failure  of  the  steel  casing 

P 

material  was  the  second  invariant  of  plastic  deviatoric  strain,  This 

quantity  is  defined  by 


i2Bp) 


13  13 


(6) 


where  summation  is  implied. 

In  the  uniaxial  tension  test  where  the  load  is  applied  in  the  z-direction , 
we  have 


such  that 
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or 
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Thus,  for  1-D  tension  test  we  have 
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(£Ld)  - 3/4  (sL)2- 


(7) 


(8) 
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The  value  of  this  quantity  at  the  failure  strain  can  be  obtained  from  the 
1-D  tensile  test  data  given  in  Table  1  by  the  formula 


o  /E 
zzj  y 


(9) 


where  E  is  Young's  modulus.  The  value  of  the  stress,  a  ,  at  the  failure 
"Y  22 

strain  is  given  by  (writing  the  quantities  without  the  zz  subscript  for 

convenience) 


O-  ~  a  +’  E  (6r  -  a  /E  ) 
f  y  t  ^  f  y  y 


where  a 

y 


is  the  yield  stress,  and 


E^  is  the  plastic  tangent  modulus. 


(10) 


Upon  substitution  of  Equation  (10)  into  Equation  (9),  one  finds  for  the 
deviatoric  plastic  strain  at  failure 

‘f  ■  (Ef  -  VV9  •  vv  (U) 

where  all  the  quantities  in  this  equation  are  measured  in  the  1-D  tensile 
test . 

The  values  of  the  invariant  ^^i  at  the  fail1-11-6  strain  are  listed 

in  Table  1.  It  is  of  interest  to  note  that  the  values  of  this  quantity  range 
over  nearly  two  orders  of  magnitude  for  the  three  sets  of  tensile  specimens. 

B.  Composition  B-3  Explosive.  Composition  B-3  explosive  is  a 
viscoelastic  material  which  is  available  in  three  forms,  pressed,  cast,  and 
powder.  In  most  munitions,  the  explosive  is  normally  inserted  into  its 
container  by  pouring  in  the  molten  state,  so  that  it  is  called  cast. 
Composition  B  explosive  and  composition  B-3  explosive  are  similar,  but  the 
B-3  form  has  no  wax  content  (Ref.  5). 


RDX 

TNT 

Wax 

Composition  B 

63% 

36% 

1% 

Composition  B-3 

60% 

40% 

— 

The  materials  properties  used  herein  are  those  for  composition  B-3  and  are 
taken  from  Reference  6,  primarily. 

After  surveying  the  available  material  properties  of  composition  B-3 
explosive  and  the  various  2-D  axisymmetric  materials  models  in  the  ADINA  code, 
it:  was  decided  that  the  curve  description  material  model  was  the  appropriate 
model  to  use  (See  pp.  XII.  16-21  of  Ref.  3).  This  model  requires  tables  of 
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bulk  moduli  and  shear  moduli  versus  volume  strain.  Specifically,  the  loading 
bulk  modulus,  the  unloading  bulk  modulus,  and  the  loading  shear  modulus  as 
functions  of  volume  strain  are  required. 

A  relationship  between  the  volume  strain  and  the  bulk  modulus  was 
obtained  from  the  Mie-Griineisen  equation  of  state  (EOS)  and  certain  other 
assumptions  which  are  detailed  in  Reference  7.  The  equation  relating  the 
bulk  modulus  to  the  volume  strain  is 


r(r+l) (Ay2+By3+Cy'4) 
K  "  2  -  yf 


+  A  +  A'y  + 


B’y2 


C'y3 


(12) 


where 

k  =  the  loading  bulk  modulus 
T  =  the  Gruneisen  coefficient 

A,B,C  =  the  coefficients  appearing  in  the  Gruneisen  EOS  in  terms  of  y 

A'  =  A  (r+1)  +  2B 

B'  =  B  (T+2)  +  3C 

c’  =  c  (r+3) 

y  =  e  /(l-e  ) 

v  v 


=  (V  -V)/VQ  »  volume  strain  taken  positive  in  compression 
Vq  =  1/p  =  specific  volume  at  normal  conditions. 


The  values  taken  from  Reference  6  for  the  materials  constants  are  as 
follows : 

p  =  1 .68  g/ cm3 

o 

T  =  0.947  (Assumed  invariant  as  a  function  of  V) 

A  =  13.5  GPa 

B  =  9.5  GPa 

C  =  100.6  GPa 

v  s=  0.29  =  Poisson’s  ratio 

Note  that  when  £  =0,y=0,K  =A  and  V  =  V  . 

v  o  o 
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Also,  in  the  Gruneisen  EOS,  at  =  0,  we  take  the  pressure  and  internal 
energy  to  be  zero,  Pq  =  E^  =  0 . 


Because  no  data  were  available  to  relate  the  unloading  bulk  modulus  to 
the  volume  strain  for  composition  B-3  explosive,  the  same  values  of  the  bulk 
modulus  for  unloading  as  for  loading  were  used.  The  loading  shear  modulus 
was  obtained  from  the  loading  bulk  modulus  by  use  of  the  relationship 


3k  (l-2v) 

g£-^o^T~ 


(13) 


Figure  9  shows  the  graphical  relationship  represented  by  Equations  (12) 
and  (13)  .  Table  2  gives  the  values  of  the  two  moduli  as  they  were  used  in 
the  ADINA  program.  ADINA  uses  linear  interpolation  between  discrete  points. 

The  tensile  volumetric  strain  at  failure  for  composition  B-3  explosive 
is  given  in  Reference  6  as  -0.1  per  cent.  This  criterion  was  used  in  all 
calculations  presented  in  this  report.  The  technique  used  in  the  ADINA  code 
to  apply  this  failure  criterion  is  by  the  artifice  of  superimposing  on  the 
applied  load-produced  strains,  an  in-situ  gravity  pressure  sufficient  to 
cause  a  hydrostatic  compression  equal  in  magnitude  to  the  tensile  failure 


TABLE  2.  ADINA  INPUT  VALUES  FOR  BULK  AND  SHEAR  MODULI  FOR 
COMPOSITION  B-3  EXPLOSIVE 


Point  No . 

e 

K  n 

K 

G. 

V 

a 

U 

X, 

(%) 

(GPa) 

(GPa) 

(GPa) 

1 

0 

13.52 

13.52 

6.60 

2 

1.0 

14.00 

14.00 

6.84 

3 

2.5 

14.91 

14,91 

7.28 

4 

3.75 

15.83 

15.83 

7.73 

5 

5.0 

16.92 

16.92 

8.26 

6 

10.0 

23.36 

23.36 

11.41 

strain.  Then,  when  the  total  strain  becomes  negative,  a  tension  cut-off 
plane  is  assumed  to  form  normal  to  the  principal  strain.  The  normal  and 
shear  stiffnesses  across  this  plane  are  reduced  by  a  factor  determined  by  an 
input  value.  One  or  two  additional  planes  orthogonal  to  existing  tension 
cut-off  plane(s)  are  allowed  to  form  if  the  strain  criterion  is  met.  The 
plane (s)  becomes  inactive  if  compression  again  develops  in  the  direction 
normal  to  it. 


The  pseudo-hydrostatic  pre-strain  is  applied  by  positioning  the  vertical 
coordinate  (z -coordinate)  at  the  proper  negative  value.  The  hydrostatic 
pressure  applied  at  an  element  integration  point  is  given  for  an  element,  j, 
by 


391 


Figure  9.  Relationship  of  Bulk  Modulus  and  Shear  Modulus  to  Volume  Strain 
for  Composition  B-3  Explosive 


N 

P.  =  -  p  7  h. .z. . 
J  e  La  iJ  ij 
i=l 


(14) 


where 


is  the  density  of  the  overburden 


^ij  s^aPe  function  for  node  i  of  element  j 


2.  is  the  vertical  coordinate  for  node  i  in  element  j. 


The  position  of  the  system  vertical  coordinate  can  be  obtained  from  the 
equation 

k  ef 

O  V 

Zave  ”  gp 


(15) 


where 


kq  is  the  initial  bulk  loading  modulus 
f 

is  the  volumetric  failure  strain,  negative  in  tension 

g  is  the  acceleration  of  gravity. 

C.  Soil  Simulation.  For  the  structural  response  calculations 
denoted  by  Case  A,  nodal  tie  elements  were  used  to  model  the  base  support  as 
nonlinear  springs.  No  simulation  of  the  soil  was  necessary  for  the  rigid 
support  calculations  of  Case  B. 

The  nodal  tie  element  is  an  element  which  is  available  in  the  Ballistic 
Research  Laboratories  version  of  the  ADINA  code.  Of  the  three  types  of  nodal 
tie  elements  available,  the  one  which  was  appropriate  was  the  boundary  type 
element.  This  element  is  defined  by  one  node  only  and  is  capable  of  three 
translational  degrees  of  freedom  (DOF)  and  three  rotational  DOF's.  In  the 
application  at  hand,  the  elements  along  the  base  of  the  mine  were  used  to 
transmit  a  vertical  force  (F  ) ,  while  those  along  the  side  transmitted  a 

horizontal  force  (F  ) . 

y 

Due  to  the  large  variety  of  soils  in  which  mines  would  be  emplaced,  it  is 
obvious  that  one  can  only  select  a  soil  simulation  model  which  would  be 
representative  of  some  subclass  of  soils.  With  this  in  mind,  a  typical  load 
deflection  curve  (Reference  8)  was  selected  to  define  the  nodal  tie  element 
properties.  The  average  load-deflection  for  the  elastic  loading  range  given 
in  Reference  8  is  .0815  MPa/cm  (30  psi/inch) .  The  load-deflection  response 
quoted  is  for  slowly  varying  loads.  For  shock  loads,  the  soil  would  be  stiffer. 
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In  an  attempt  to  account  for  the  dynamic  response  of  soil,  a  nonlinear 
quadratic  component  was  added  to  the  force-deflection  property.  The  magnitude 
of  the  nonlinear  response  term  was  made  equal  to  the  linear  component  at  a 
deflection  of  2.5  cm.  The  nodal  spring  constant  and  spring  force  as  a 
function  of  vertical  displacement  are  shown  in  Figure  10.  These  values  were 
used  for  the  nodal  tie  elements  along  the  base  of  the  mine.  It  should  be 
noted  that  when  the  displacement  is  positive,  the  spring  force  and  spring 
constant  are  zero. 

For  the  support  along  the  vertical  sides  of  the  mine,  a  linear  spring 
force  is  used.  This  was  done  with  two  thoughts  in  mind.  First,  the  movement 
of  the  mine  in  the  lateral  direction  is  small.  Second,  the  soil  on  the  sides 
of  the  mine  is  disturbed  when  the  mine  is  emplaced  and  the  soldier  is  not 
going  to  tamp  the  soil  there,  except  lightly,  while  the  mine  is  armed. 

In  the  ADINA  input  data,  the  foregoing  nonlinear  stiffness  values  are 
adjusted  by  a  factor  proportional  to  an  annular  sector  of  it  radians  and  a 
radial  extent  appropriate  for  the  particular  nodal  tie  element.  Along  the 
vertical  side,  the  linear  nodal  tie  element  stiffness  values  are  proportional 
to  the  height  of  the  particular  element  onto  which  the  nodal  tie  boundary 
element  is  attached. 

4.  FINITE  ELEMENT  MODEL  DESCRIPTION  AND  CALCULATIONS. 


A.  Mesh  Generation.  The  finite  element  mesh  for  the  mine  was 
generated  with  the  aid  of  the  GEN3D  mesh  generator  code  (Ref.  9).  The 
resulting  mesh  for  the  steel  and  explosive  element  groups  are  shown  in 
Figures  11  and  12.  A  six  node  QUAD  element  with  quadratic  displacement 
interpolation  functions  in  the  direction  parallel  to  the  surface  was  used  for 
the  steel  casing.  This  element  models  the  bending  of  the  thin  metal  casing 
better  than  a  four  node  QUAD.  A  four  node  QUAD  was  used  to  model  the 
explosive  except  at  the  material  interface. 

In  ADINA,  each  material  having  a  distinct  model  for  response  must  be 
modeled  as  a  separate  element  group.  For  the  Case  A  calculations  with  the 
nodal  tie  support  elements,  four  element  groups  were  required:  (1)  linear 
node  ties  on  the  side  of  the  mine,  (2)  nonlinear  node  ties  on  the  base  to 
simulate  soil  support,  (3)  nonlinear  2-D  solid  elements  for  the  steel  case, 
and  (4)  nonlinear  curve  description  2-D  solid  elements  for  the  explosive. 

For  the  steel  case,  three  material  subtypes  were  used  to  model  the  steel 
properties  as  shown  in  Table  1.  The  assignment  of  the  material  subtypes  to 
the  areas  of  the  mine  case  is  shown  in  Figure  11.  For  the  Case  B  calculations, 
the  nodal  tie  element  groups  were  not  required. 

B.  Time  Step  Solution.  In  ADINA,  one  has  the  choice  of  marching 
the  dynamic  solution  forward  via  explicit  or  implicit  finite  difference  tech¬ 
niques.  In  general,  it  is  difficult  to  make  absolute  statements  as  to 
which  is  best  for  a  given  application.  For  shock  loads  such  as  indicated  in 
Figure  4,  it  has  been  our  experience  that  the  explicit  method  gives  the 
higher  quality  solution.  The  subject  problem  was  run  for  a  relatively  large 
number  of  cycles  using  both  implicit  (with  equilibrium  iterations  included) 
and  explicit  time  integration  solutions.  After  a  given  amount  of  problem 
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Figure  10.  Nonlinear  Elastic  Nodal  Tie  Boundary  Stiffness 


395 


SPRING  CONSTANT  (MPa/cm) 


Figure  11.  Element  Configuration  for  Steel  Casing,  M-15  Antitank  Mine 


397 


solution  time,  the  results  were  compared.  The  explicit  solution  had  a 
smoother  variation  in  both  displacements  and  stresses.  For  this  reason,  we 
selected  the  explicit  solution  method. 


The  time  step  selected  for  the  explicit  scheme  was  sized  to  the  thick¬ 
ness  of  the  metal  casing. 


At  = 


_ .00094  _ 

3  [2  x  101  V7800j 


60  nsec 


A  time  step  of  50  nanoseconds  was  used  for  all  calculations. 

Eigenfrequencies  and  mode  shapes  were  calculated  for  the  lowest  four 
modes  for  both  Cases  A  and  B.  The  result  is  shown  in  Table  3  for  the  former. 


TABU;  3.  EIGENFREQUENCIES  AND  PERIODS 
Case  A  Case  B 


Frequency 

Period 

Frequency 

Period 

(cps) 

(sec) 

(cps) 

(sec) 

36.44 

2.744 

X 

10" 2  (Rigid  Body  Mode) 

6426 

1.556  x  icr4 

3636 

2.750 

X 

lO’4 

7899 

1.266  x  10"4 

6710 

1.490 

X 

10"4 

9685 

1.032  x  10" 4 

8531 

1.172 

X 

10~4 

12186 

8.205  x  10"5 

The  0,5  x 

10" 7  second 

l  time  step  may  be  compared  to 

the  1.5  x 

10” 4  second 

period  of  the  fundamental  mode  of  the  rigid  support  configuration.  It  is 
readily  seen  that  there  is  no  danger  of  not  capturing  the  response  of  the 
fundamental  eigenf requency  as  well  as  many  of  the  higher  frequency  modes. 

C,  ADINA  Program  Modifications.  Three  significant  modifications  to 
the  ADINA  program  were  required  in  the  course  of  performing  the  calculations 
herein.  First,  it  was  necessary  to  correct  the  stress  state  to  lie  on  the 
yield  surface  to  a  higher  order  of  accuracy  than  that  fulfilled  in  the 
standard  ADINA  program.  In  the  ADINA  program,  the  von  Mises  yield  criterion 
may  be  applied  as 


E .  . 
il 


E.  . 

il 


<  2/3 


(16) 


where 


'il 


Sij 


-  a 


il 


=  deviatoric  stress  component 
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oUj  =  tensor  defining  the  center  of  the  current  yield  surface 
os  =  current  yield  stress. 

Upon  taking  a  time  step  during  which  plasticity  occurs,  the  new  yield 
criterion  is 


(E..  +  AE ..)(!...  +  AE.  .)  <  2/3  o  2 


13'  13 


ij 


(17) 


The  quadratic  term,  AE^AE^,  was  neglected  in  ADINA  and  caused  the  stress 

state  to  gradually  creep  outside  the  yield  surface.  At  some  point,  this 
causes  a  negative  square  root  to  be  encountered,  which  in  turn,  aborted  the 
solution.  Upon  inclusion  of  the  second  order  correction,  no  further 
difficulties  of  this  type  were  encountered.  Further  details  of  this  correction 
may  be  found  in  References  10  and  11. 


The  second  major  modification  of  ADINA  involved  the  addition  of  sub¬ 
routines  and  modification  of  current  routines  to  allow  the  monitoring  of 
extremal  principal  stresses  and  strains  for  the  steel  element  group.  ADINA 
normally  does  not  print  strains,  and  extremal  values  of  the  stresses  are  often 
difficult  to  discern.  The  routines  included  here  check  the  values  of  the 
three  principal  stresses  and  three  principal  strains  at  each  time  step 
against  currently  stored  maxima/minima  for  similar  quantities.  Any  new 
extremal  values  found  are  placed  in  a  save  table  which  also  includes  the  time 
and  location  of  occurrence  as  well  as  a  snap  shot  of  the  cartesian  stress- 
strain  state.  This  table  may  be  printed  when  desired.  It  is  also  included 
in  the  restart  file  for  consistency. 


The  third  modification  of  ADINA  was  required  to  monitor  the  failure 
criterion  for  the  steel  casing.  Provision  was  made  to  input  for  each 
material  subtype  the  1-D  measured  value  at  failure  of  the  second  invariant 
of  plastic  strain.  At  each  integration  point  for  each  element  for  each  time 

step,  the  3-D  value  of  1^ (&^  p)  is  calculated  and  compared  to  the  indicated 

failure  value  for  that  material.  Information  is  saved,  giving  the  maximum 

value  of  ^2(^3  d)  ^or  eac^  elenient  as  well  as  the  integration  point  number 

and  the  time  of  occurrence.  A  compact  table  is  printed  as  desired  showing 
for  each  element  whether  plasticity  has  occurred;  if  it  has,  what  the  maximum 

value  of  ^(^3  d)  as’  w^et^er  the  value  has  exceeded  the  input  failure  value; 

and  if  so,  at  what  time  and  location. 

5.  ANALYSIS  OF  RESULTS.  The  output  from  these  calculations  included 
the  usual  printed  output  from  ADINA  giving  cartesian  displacements  and  stresses, 
forces  in  the  node  tie  elements,  as  well  as  the  tables  of  extremal  stresses 
and  strains  and  tables  associated  with  the  failure  criteria.  In  addition, 
plot  files  were  saved  every  2.5  microseconds,  giving  a  complete  picture  of  the 
solution.  The  latter  were  used  in  conjunction  with  two  post-processors, 

PLOT1D  and  PL0T3D  (Ref.  9)  to  provide  a  graphical  picture  of  such  quantities 
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as  deformed  shapes,  contour  plots  of  various  stress  and  strain  components  as 
well  as  time  varying  plots  of  quantities  of  interest. 

The  calculations  for  both  Case  A  and  Case  B  were  carried  out  to  a 
response  time  of  2  milliseconds.  For  Case  A,  four  failures  of  the  steel 
jacket,  were  predicted  as  shown  in  Figure  13.  The  failures  predicted  were 
all  in  the  area  of  the  primary  fuze  well.  However,  other  areas  on  the  steel 

jacket  had  significant  plastic  flow  during  the  first  two  milliseconds  of 

response  (2nd  invariant  of  plastic  strain  attained  at  least  one-half  of  the 
failure  value) .  Figure  14  shows  the  deformed  shape  of  the  configuration  at 
the  time  of  the  first  predicted  failure  in  the  steel  jacket.  In  order  to 
make  the  deformation  of  the  structure  more  meaningful,  the  rigid  body  motion 
on  the  vertical  springs  has  been  subtracted  out.  Also,  the  vector  displace¬ 
ment  as  measured  from  the  reference  points,  indicated  by  the  crosses,  has 
been  magnified  by  an  appropriate  factor.  Figure  15  shows  a  contour  plot  of 
the  hoop  stress  in  the  critical  fuze  well  area  at  the  time  of  the  first  pre¬ 
dicted  failure.  It  is  readily  seen  that  there  are  strong  bending  stresses 

occurring  near  the  corners  of  the  fuze  well.  Figure  16  shows  contours  of 
constant  radial  strain  at  this  same  time  for  the  nonlinear  spring  supported 
conf igurat ion . 

Figures  17-20  show  similar  plots  for  Case  B  with  the  rigid  support  as 
Figures  13-16  show  for  Case  A.  For  Case  B,  it  was  not  necessary  to  subtract 
rigid  body  motion  since  the  base  of  the  mine  is  restrained  in  the  vertical 
direction.  Notice  that  the  first  failure  for  Case  B  occurred  earlier  than  for 
Case  A  and  the  center  of  fuze  well  is  deflecting  downward  rather  than  upward. 
In  Case  B,  the  vertical  sides  of  the  mine  were  loaded  by  the  transient  pres¬ 
sure.  In  general.  Case  B  is  a  much  more  highly  constrained  structure  than 
Case  A.  The  deformation  of  Case  B  at  approximately  h  microsecond  time  is 
generally  inward  in  compression.  Figure  19  shows  the  center  of  the  fuze  well 
with  intense  stress  gradients.  From  Figure  20,  it  is  evident  that  this  area 
is  undergoing  a  rather  strong  shearing  action. 

The  printed  output  from  these  calculations  showed  that  the  explosive 
filler  was  developing  cracks  in  many  locations  and  directions  according  to  the 
-0.1%  tensile  failure  criterion.  These  cracks  were  indicated  very  early  in 
the  response  and  continued  to  develop  at  later  times.  Thus,  we  expect  the 
internal  mass  of  explosive  to  be  nearly  pulverized  by  the  time  of  the  casing 
ruptures  indicated.  Whenever  the  casing  ruptures,  it  is  assumed  that  a  large 
volume  of  the  explosive  would  be  ejected.  This  was,  in  fact,  the  indication 
in  the  tests  conducted  (Ref.  1).  It  is  not  possible  to  compare  the  deformed 
shapes  resulting  from  the  tests  described  in  Reference  1  with  those  predicted 
herein.  The  only  results  given  in  Reference  1  are  photographs  of  the  final 
result  of  the  deformation.  Also,  for  economy  reasons,  the  calculation  was  not 
run  far  enough  to  get  a  final  deformed  shape. 

Shown  in  Figure  21  is  the  motion  of  the  center  of  mass  of  the  mine  for 
Case  A  as  a  function  of  time.  It  is  seen  that  the  mine  is  deflecting  into 
the  soil  under  the  intense  pressure  pulse.  At  approximately  1.4  milliseconds, 
the  spring  forces  and  the  transient  top  surface  load  equilibrate.  Thereafter, 
the  base  support  load  becomes  larger  than  the  surface  loading  and  the  mine 
begins  to  decelerate.  At  2.0  milliseconds,  the  average  pressure  differential 
between  the  surface  loading  and  the  base  reaction  is  1.28  MPa. 
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PREDICTED  FAILURE  LOCATIONS  AND  TIMES  OF  STEEL  JACKET 
AREAS  OF  SIGNIFICANT,  BUT  LESSER  PLASTIC  FLOW 


< 
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Figure  13.  Predicted  Failures  of  M-15  Mine  Jacket  with  Nonlinear  Spring  Support,  Case 
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Figure  14.  Deformed  Shape  of  M-15  Mine  at  the  Time  of  First  Predicted  Failure,  Case 
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Figure  15.  Contour  Plot  of  Hoop  Stress  in  Fuze  Well  Area  at  the  Time  of  First  Predicted 
Failure,  Case  A 
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Figure  16.  Contour  Plot  of  Radial  Strain  in  Fuze  Well  Area  at  the  Time  of  First  Predicted 
Failure.  Case  A 


PREDICTED  FAILURE  LOCATIONS  AND  TIMES  OF  STEEL  JACKET 


405 


Figure  17.  Predicted  Failures  of  M-15  Wine  with  Rigid  Base  Suport,  Case 
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Figure  18.  Deformed  Shape  of  M-15  Mine  at  Time  of  First  Predicted  Failure,  Case 
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Figure  19.  Contour  Plot  of  Hoop  Stress  in  Fuze  Well  Area  at  the  Time  of  First  Predicted 
Failure,  Case  B 


CLOOoOo 

uJOQ^r^Z 

CrtLO£  Z 


??£f  H???? 

fill 

—  rsiKi't'iyjorvcoop 


408 


Figure  20.  Contour  Plot  of  Radial  Strain  in  Fuze  Well  Area  at  Time  of  First  Predicted 
Failure,  Case  B 
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Figure  21.  Motion  of  Center  of  Mass  of  M-15  Mine  with  Nonlinear  Spring  Support 


6.  CONCLUSIONS.  The  M-15  mine  is  predicted  to  fail  in  the  area  of  the 
central  fuze  cavity  at  the  13.8  MPa  peak  pressure,  6.5  kPa-sec  impulse  level 
in  both  the  spring  support  and  rigid  support  conditions.  This  agrees  with 
experimental  tests,  which  showed  catastrophic  failure  of  the  metal  casing 
and  ejection  of  the  secondary  fuze  wells. 

The  explicit  time  integration  method  gave  the  most  accurate  results  for 
the  shock  loaded  mine.  This  statement  is  based  on  the  smoothness  of  calculated 
displacements  and  stresses. 

The  M-15  mine  case  is  highly  inhomogeneous  in  its  constitutive  structural 
properties.  Follow-on  work  should  take  account  of  these  inhomogenities .  A 
liberal  sampling  of  stress -strain  data  measurements  is  indicated  for  such  deep 
drawn  thin  metal  components. 

Second  order  corrections  to  assure  that  the  stress  state  is  on  the  yield 
surface  during  plastic  flow  are  required. 

The  soil  medium  supporting  the  mine  and  the  nature  of  the  loading  of  the 
sidewall  have  a  significant  influence  on  the  resulting  response.  It  is 
recommended  that  the  soil  medium  be  included  explicitly  in  any  additional 
studies.  Attenuation  of  the  shock  in  the  area  of  the  sidewall  should  be 
investigated. 
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ABSTRACT.  Computer  programs  for  calculating  the  elastic-plastic  response  of 
structures  and  solids  employ  a  variety  of  plasticity  algorithms  which  basically 
differ  in  the  approximations  used  for  the  flow  rule  and  the  yield  condition*  This 
paper  focuses  on  the  linear  kinematic  hardening  model  of  plastic  behavior,  presenting 
a  numerical  comparison  between  a  number  of  commonly  used  approximations  and  an 
exact  solution  to  this  model.  The  exact  solution  is  obtained  by  quadrature  based 
on  assuming  proportional  straining  during  an  increment*  Comparisons  are  presented 
for  biaxial  and  triaxial  states  of  stress  and  recommendations  are  made  as  to  the 
"best"  approximation  and  the  "optimum"  number  of  subincrements  needed  for  a  given 
accuracy* 

1*  INTRODUCTION.  Computer  programs  that  treat  the  elastic-plastic  behavior 
of  materials  usually  employ  an  incremental  approximation  to  the  plasticity  equations, 
with  the  stress  being  calculated  at  discrete  steps*  This  usually  entails  using  a 
finite  difference  approximation  to  the  flow  rule*  In  addition,  for  the  sake  of 
simplicity,  a  linear  approximation  to  the  yield  condition  is  often  imposed  on  the 
stress , 


This  paper  concerns  itself  with  code  approximations  to  linear  kinematic 
hardening  based  on  the  Prager  model  of  a  yield  surface  translating  in  stress  space 
(ref.  1) .  The  von  Mises  yield  condition  is  employed  with  the  associative  flow 
rule*  Hardening  is  taken  to  be  proportional  to  the  plastic  strain.  Elastic- 
perfectly  plastic  behavior  is  automatically  included  in  the  analysis  by  setting 
the  hardening  parameter  equal  to  zero.  Both  biaxial  (plane  stress) and  triaxial 
states  of  stress  are  analyzed. 


2.  REVIEW  OF  PLASTICITY  EQUATIONS* 
theory  of  elastic-plastic  behavior  (ref. 

the  additive  decomposition  of  the  strain 
plastic  strain 


We  briefly  summarize  the  Prandtl-Reuss 
2)  as  background.  The  theory  assumes 

0 

into  an  elastic  strain  e.  .  and  a 
n  li 


(1) 


*Cartesian  tensor  notation,  including  the  summation  convention,  is  employed 
throughout,  with  Latin  indicies  i,  i,  k,  ...  -1,  2,  3  and  Greek  indicies 
ct,  6,  ci,  ...  —  1  > “ * 
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with  the  plastic  portion  taken  to  be  incompressible 


(2] 


The  stress  a.,  is  assumed  to  be  related  to  the  elastic  strain  through  Hooke's 
law  for  an  isotropic  material 


_E_  .  e  v 
1+v  *■  Eij  l-2v 


e*  &.  .  ] 
kk  ij 


(3) 


where  E  is  Young's  modulus  and  v  is  Poisson's  ratio.  As  already  mentioned,  the 
Prager  model  of  kinematic  hardening  (ref.  1)  is  assumed,  so  that  the  von  Mises 
yield  condition  takes  the  form 


1 

2 


Z  .  . 
13 


(4) 


where 


Z.  . 

13 


(5) 


and  a  is  the  yield  stress  from  a  uniaxial  tensile  test,  a. .  measures  the  translation 
y  x3 

of  the  yield  surface  in  stress  space,  and  S. .  is  the  deviator  of  the  stress: 

13 


S  —  cj  *  .  —  "z"  q”i  1  (5 ,  , 

ij  13  3  kk  rj 


(6) 


CURRENT  LOCATION  INCREMENTAL  SHIFT 


Figure  1,  Representation  of  the  von  Mises  yield  condition  in  stress  space  as 
viewed  along  the  diagonal  from  the  positive  tensile  stress  octant. 
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As  shown  by  Figure  1,  in  stress  space  the  yield  condition  represents  a  cylindrical 

h 

(hyper-)  surface  of  constant  radius  (2/3)^  a  with  its  axis  parallel  to  the 


diagonal  from  the  origin  into  the  positive  tensile  stress  octant.  Initially,  the 
axis  passes  through  the  origin,  but  as  plastic  deformation  progresses,  the 
surface  shifts  in  accordance  with  the  hardening  rule,  which  for  a  linear  hardening 
model  is  directly  proportional  to  the  plastic  strain 


3.  E-E 

P 


(7) 


where  E^  is  the  plastic  modulus  as  specified  by  the  slope  of  the  stress-strain 

curve  in  the  plastic  range.  Lastly,  the  plastic  strain  is  determined  using  the 
associative  flow  rule,  which  for  the  von  Mises  condition  is 


(8) 


where  dX  >  0  is  the  flow  parameter  which  is  adjusted  so  as  to  maintain  the  stress 
on  the  yield  surface  during  plastic  flow.  Combining  (7)  and  (8)  we  see  that 
instantaneously  the  yield  surface  will  translate  in  the  direction  of  the  normal 
at  the  point  of  stress  application,  as  portrayed  in  Figure  1.  As  a  special  case 
these  equations  include  elastic-perfectly  plastic  response  by  setting  E  =  0 
so  that  the  yield  surface  does  not  translate. 


For  purposes  of  analysis  in  this  paper,  the  foregoing  equations  are  reduced 
to  a  fundamental  set  of  equations 


dZ.  . 
i] 

-  dS(j  - '  hj  di 

O) 

dS? . 
il 

'  1+v  f  deij  3  dekk  6ij] 

(10) 

E.  . 
i) 

2  2 
=  —  0. 

3  y 

(11) 

on  the  unknown  components  of 


1J 


where 


K 


1  +  v 

JL 

E 

P 


1 


(12) 


depends  on  the  material  constants  and  the  flow  parameter  is  redefined  as 


dX  =  ~  dX 
1+v 


(13) 
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In  equation  (9)  dS. .  operates  as  a  forcing  function  specified  by  (10)  as  a 

**■  J 

function  of  the  prescribed  strain  differential.  The  yield  condition  (11)  acts 
as  a  constraint  on  the  values  of  Z^j .  The  flow  parameter  is  controlled  so  that 

dA  =  0  when  the  stress  is  inside  the  yield  surface  or  when  Z . .  dsf .  <  0  and  the 

.  .  e 

stress  is  on  the  yield  surface;  otherwise,  when  Z.j  dS^.  ^  ^  the  now  parameter 

takes  on  positive  values  that  assure  that  the  stress  is  maintained  on  the  yield 
surface.  The  solution  of  the  system  determines  the  history  of  the  stress  at  a 
point  as  a  function  of  the  strain  history. 


In  the  case  of  biaxial  states  of  stress  or  plane  stress  situations,  where 
the  stress  components  for  one  coordinate,  for  example  the  3  coordinate,  vanish: 
°13  =  °23  =  °33  =  0>  the  Previous  system  reduces  to 


dZ 


aB 


da 


aB 


-  1 K  ZaB  '  "  E 


YY 


dA 


(14) 


^aB  ZaB  +  £aa  SBB  3  °y 


where  now  the  prescribed  forcing  term  is 


(15) 


E 

1+v 


y  de 

YY 


and  a  second  material  parameter  appears 


(16) 


u 


1  l-2v 
3  1  “  v 


(17) 


This  system  turns  out  to  be  more  complicated  than  the  system  for  triaxial  states 
of  stress  principally  because  of  coefficient  y.  For  incompressible  materials 
(v  ^  %)  p  vanishes  and  the  two  systems  behave  similarly.  It  should  also  be  noted 
that  the  yield  condition  (15)  is  now  represented  by  an  ellipsoid  with  a  major 
axis  along  the  coordinate  axis  and  the  remaining  axes  in  the  plane  of  the 

and  coordinate  axes  at  angles  of  45°  to  these  axes. 


5,  CODE  APPROXIMATIONS  TO  PLASTICITY  EQUATIONS.  As  already  pointed  out, 
computer  programs  for  calculating  the  elastic-plastic  response  of  bodies  commonly 
determine  the  strain  and  the  stress  throughout  the  body  at  discrete  time  or 
loading  steps.  The  usual  procedure  involves  first  determining  the  change  in 
the  strain  from  the  prior  step  to  the  current  step  without  recourse  to  plasticity 
equations.  Then  the  values  of  the  stress  and  the  strain  at  the  prior  step  and 
the  strain  at  the  current  step  are  used  in  a  plasticity  algorithm  to  calculate 
the  current  value  of  the  stress.  Some  programs  will  iterate  on  the  current  strain 
and  stress  values,  but  basically  plasticity  algorithms  involve  the  determination  of 
a  current  stress  from  a  prescribed  strain  increment  and  a  known  prior  stress. 

Greek  indicies  are  over  the  range  1,  2. 
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Because  the  strains  are  prescribed  at  discrete  intervals,  plasticity  algorithms 
are  almost  invariably  based  upon  plasticity  equations  in  which  finite  differences 
replace  derivatives.  Hence,  for  example,  in  the  case  of  triaxial  stress,  given: 


e.  . 


ij 

the  plasticity  algorithm  will 


3  prior  value  of  the  stress, 

=  prior  value  of  the  strain, 

~  current  value  of  the  strain, 

employ  the  following  approximation  to  equation  (9] 
AIij  =  ASij  '  K  Eij  A*  (18) 


where 


AS?.  =  [  Ae.  .  -  4  Ae  6  1 

ij  1+v  ij  3  yy  aB 


(19) 


is  known  from  the  prescribed  strain  increment  Ae. .  =  e.  -  e?.  and  where 

ij  ij  ij 


Z*. 

ij 


+  cu  ae.  . 


ij 


ij 


(0<o><1) 


(20) 


is  some  intermediate  value  between  the  prescribed  prior  value  Z°.  .  and  the  as 

yet  to  be  determined  current  value  E„  =  S?^.  +  AZi . .  Equation  (IS)  can  be  regarded 

as  a  forward,  central,  or  backward  finite  difference  depending  on  the  value  of  to : 


m  = 


0  ;  forward  difference 
•2  ;  central  difference 
1  ;  backward  difference 


A  second  approximation  often  employed  in  algorithms  (ref.  3,  4)  involves 
replacing  the  yield  condition  imposed  on  the  current  value  of  the  stress  by  a 
linear  approximation  to  this  condition.  In  terms  of  the  previous  example,  this 
means  that  rather  than  the  current  stress  satisfying  the  exact  yield  condition 


AS.  . 

IJ 


AE.  . 
11 


AE  -  . 
il 


* 


0 


it  is  required  to  satisfy  the  linear  approximation 


(21) 


* 

This  relation  follows  from  (il)  on  assuming  that  the  prior  stress  satisfies  the 
yield  condition.  This  is  safely  assumed  since  a  simple  elastic  calculation  can 
be  used  to  eliminate  any  portion  of  the  strain  increment  inside  the  yield  surface, 
for  example  see  Table  4.7  in  ref.  5. 
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2  E?.  AI. .  =  0 
1J  ij 


(22) 


This  approximation  assumes  that  the  stress  increment  is  small  enough  to  permit  the 
square  terms  to  be  neglected  in  comparison  with  the  linear  terms.  However,  the 
approximation  results  in  an  error  by  computing  a  stress  outside  the  yield  surface. 
To  understand  the  reason  for  this,  consider  equation  (22)  as  a  vector  relation  in 
stress  space,  as  illustrated  in  Figure  2  for  the  case  of  a  forward  difference 


Figure  2.  Illustration  showing  how  the  use  of  the  linearized  yield  condition 

determines  a  stress  increment  outside  the  yield  surface  that  requires 
correcting. 

(w  =  0)  flow  rule  approximation.  Equation  (22)  requires  the  stress  increment 
to  be  perpendicular  to  the  direction  of  the  yield  surface  normal  and  hence 

tangent  to  the  surface.  Because  the  von  Mises  surface  is  strictly  convex,  the 
increment  will  determine  a  stress  outside  the  yield  surface  unless  a  correction 
is  applied,  as  for  example  shown  in  Figure  2  where  a  corrected  stress  increment 

c 

E..  is  obtained  by  a  radial  correction.  As  pointed  out  in  references  6  and  7, 
ij 

the  lack  of  a  correction  in  the  kinematic  hardening  subroutine  of  the  ADINA  finite 
element  program  (ref.  3)  had  been  found  to  cause  a  premature  termination  due  to 
the  cumulative  effect  of  uncorrected  increments.  A  correction  to  this  subroutine 
based  on  the  combined  use  of  a  central  flow  rule  approximation  and  the  exact  yield 
condition  has  been  implemented  in  the  ADINA  code  and  is  detailed  in  reference  7. 

On  the  other  hand,  when  the  exact  yield  condition  (20)  is  used,  as  illustrated  in 
Figure  3,  the  parameter  AA  takes  on  just  the  right  value  to  give  a  stress  on  the 
yield  surface. 
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Figure  3.  Illustration  showing  how  the  use  of  the  exact  yield  condition  maintains 
the  stress  on  the  yield  surface. 

While  the  use  of  the  flow  rule  approximation  is  more  or  less  dictated  by  the 
discrete  nature  of  the  code  calculations,  the  use  of  the  linearized  yield  condition 
is  hard  to  justify.  Since  modern  computers  can  solve  complex  algebraic  relations 
very  efficiently,  there  seems  little  reason  not  to  use  the  exact  quadratic  expression. 
In  fact,  in  the  case  of  triaxial  stress,  when  the  exact  condition  (21)  is  used, 
the  equation  to  be  solved  for  AX  is  at  worst  quadratic  and  if  the  central  flow 
rule  approximation  (oj  -  %)  is  used  the  relation  on  AX  becomes  linear  (ref.  7). 


It  is  not  uncommon  practice  in  codes  (ref.  3  and  8)  to  subincrement  the  plasticity 
algorithm  in  order  to  increase  accuracy.  Sub incrementing  involves  dividing  the 
strain  increment  into  a  number  of  equal  sub increments ,  as  depicted  for  example  in 
Figure  4  for  three  subincrements.  The  plasticity  algorithm  is  applied  sequentially 

o  1  2 

to  each  subincrement  with  the  prior  stress  being  updated  at  each  step:  * 

so  that  the  normal  direction  changes  at  each  subincrement  step.  In  this  way, 

sub incrementing  results  in  a  closer  simulation  of  the  differential  system  (9)  through 

(11).  Subincrementing  can  be  used  with  any  of  the  flow  rule  approximations  in 

combination  with  either  the  exact  yield  condition  or  the  linearized  condition 

(plus  correction).  We  shall  be  evaluating  the  accuracy  of  a  number  of  these 

approximations  by  comparing  them  with  an  exact  solution  that  represents 

the  limit  of  the  subincrement  process  as  the  individual  sub increments  go  to  zero. 


Corresponding  remarks  about  approximating  the  flow  rule  and  the  yield 
condition  and  about  sub increment ing  also  apply  in  the  case  of  biaxial  stress. 
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Figure  4,  Illustration  showing  the  division  of  the  increment  AS0,  into  three 

equal  subincrements  in  order  to  increase  the  accuracy  of  the  plasticity 
calculation. 

4.  PROPORTIONAL  INCREMENTAL  STRAINING.  In  order  to  integrate  the  differential 
system  (9)  through  (11)  it  is  necessary  to  make  an  assumption  about  how  the  strain 
varies  from  the  prior  to  the  current  step.  We  simply  assume  that  the  strain 
increases  in  some  continuous  way  from  its  initial  to  its  final  value  along  the 
direction  defined  by  the  discrete  strain  increment.  Hence,  the  components  of  the 
continuously  increasing  strain  increment  will  be  in  the  same  proportion  as  the 
corresponding  components  of  the  discrete  strain  increment.  In  the  case  of  triaxial 
stress,  this  assumption  means  that  during  the  strain  increment  the  forcing  function 

S*  in  (9)  is  proportional  to  the  increment  AS6.: 
ij  13 


S6.  =  f(A)  AS?. 
13  '13 


(23) 


from 

zero  to  AA.  With  this  assumption, 

(9) 

becomes 

dZ.  • 

il  _ 

df 

AS?.  - 

K  Z.  . 

dX 

dA 

11 

11 

where 

2..  satisfies  the  yield 

condition 

(11) 

for  all 

values  of 

0  <  X 

<,  AA.  At  the  limits  we 

have : 

A  =  0 

li 

O 

§ 

Z.  .  = 

Z°. 

11 

il 

f  <-< 
<1 

IJ 

=»  f  =  1 

c 

Q 

z. .  = 

Z°.  +  AZ.  . 

11 

11  11 

(24) 


(25) 
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Equation  (24)  can  be  regarded  as  the  continuous  counterpart  of  the  discrete  flow 
equation  (18).  For  the  case  of  biaxial  stress,  the  corresponding  assumption 
yields  a  similar  simplification  of  (14) . 


5.  TRANSFORMATION  OF  TRIAXIAL  EQUATIONS.  In  order  to  facilitate  the  analysis 
of  the  previous  equations,  a  tranformation  of  coordinates  in  stress  space  is 
required.  For  the  triaxial  stress  case,  we  transform  to  a  set  of  orthogonal 
coordinates  in  the  plane  of  the  vectors  Z?^  and  A  ,  while  simultaneously 

normalizing  coordinates  with  respect  to  the  yield  stress  cr  .  Although  not  necessary, 

for  convenience  one  coordinate  axis  is  made  to  coincide  with  the  direction  of 

E?.  by  defining  the  basis  vector  a.,  as  follows: 

xi  xj 


(26) 


It  follows  from  the  yield  condition  (11)  that  a^  is  a  unit  vector  (i.e.,  a^  a„  - 
1).  Using  the  Gram-Schmitt  process,  a  second  unit  vector  b„  orthogonal  to  the 
first  (i.e. ,  a^  b^j  -  0)  is  defined  by  requiring  that  it  satisfy  the  equation 


J|  -  n,  a..  ♦  n2  b 

» 2  o  1  il  2 


il 


The  components  of  A  relative  to  this  new  basis  will  be 


rz  AS .  .  a  -  . 

"1  -  ff  S  ^2 

y 


rz  AST.  b.  . 

-  J±  U  1 J. 

1 2  a 


Resolving  Z„  relative  to  this  basis 

n  h 

7 

y 


r:  E.  . 

J1  -il  = 

12  0 


t,  a. ^  b. j 


1  1J 


1J 


the  components  of  become 


Z. .  a. . 

§  T, 

0  < 
V 


R  Z.  .  b. . 

=  Ji  _JJ _ il 

1  2  a.. 


(27) 


(28) 


(29) 


(30) 


It  follows  from  (26)  that  initially  these  components  will  have  the  values  -  1 
and  t'  =  0.  By  substituting  (27)  and  (29)  in  (18)  ,  the  discrete  flow  equation  is 
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transformed  into  the  pair 

AT1  =  nl  "  T1  AA  S  At2  =  n2  "  t2  AA  (31) 

where,  as  before,  t*  (a  =  1,  2)  are  intermediate  values: 

**  =  T°  +  0)  Ara  (0  <  U3  <1)  (32) 

and  where  the  flow  parameter  is  redefined  as 

AA  =  ic  AX  (33) 

With  this  transformation  the  yield  condition  (11)  becomes  the  unit  circle 

T!2  -  V  =  1  (34) 

which  both  the  initial  and  final  values  of  and  are  to  satisfy. 

Applying  the  same  transformation  to  the  equation  of  proportional  incremental 
straining  causes  the  continuous  flow  equation  (24)  to  reduce  to 


^1  df  «  dT2  df 

dA  '  dA  nl  "  T1  5  dA  dA  n2  "  7 2 

From  (25)  the  limit  values  follow: 


A  =  0  =*  f  =  0  §  r  =  x° 

a  a 

A  =  AA  f  =*  1  §  x  »  t°  +  At 

a  a  a 


(35) 


(36) 


where  a  =  1,  2.  The  components  t  are  required  to  satisfy  the  yield  condition  (34) 

a 

continuously  between  the  limits.  The  results  of  this  transformation  for  both  the 
discrete  and  the  continuous  case  are  summarized  in  Table  1. 


The  problem  in  transformed  coordinates  is  portrayed  in  Figure  5,  reduced  to 
its  essential  ingredients.  Because  the  yield  condition  corresponds  to  a  unit 
circle,  it  will  be  automatically  satisfied  when  polar  coordinates  are  employed. 
Hence,  let  us  set 

t  =  cos  9  T.  =  sin  9 

(37) 

=  n  cos  a  n7  =  n  Sin  a 
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ir 


Tj  =  cos 9 

rfscosfio 

rf^rfcosa 


Tj  2 s in  5 
T® -sin 
7^2  =  7)sin  Ot 


GIVEN:  7),  a,  0O 
FIND:  Q 


Figure  5.  Representation  of  triaxial  stress  problem  in  transformed  stress 
coordinates . 


where 


n 


v 


(AS?.  A  S e.)'2 
_ _ j -1  ij 


(38) 


Substituting  (37)  into  the  continuous  flow  equations  (35)  ,  we  obtain  a  differential 
equation 


-  n  sin  (a-6)  (39) 

on  the  single  unknown  9.  Integrating  this  equation  between  the  limits  (36),  we 
obtain  the  solution 


tan 


0-ot 

T~ 


tan 


2 


(40) 


where  0o  represents  the  initial  value  of  the  polar  angle  and  0  the  final  value. 
Making  a  similar  substitution  in  the  discrete  flow  equations  (31)  and  (32)  results 
in 


sin  (9 - 0 o )  =  n  [  (1-oj)  sin  (a-0o)  +  o)  sin  (a-@)]  (41) 

Notice  that  both  the  discrete  and  the  continuous  solutions  are  independent  of  the 
initial  stress,  as  specified  by  0O;  that  is,  both  solutions  are  essentially 
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functions  of  differences 


0'  *  0-0o  §  a*  -  a-0o  (42) 

We  shall  find  that  this  is  not  true  in  the  case  of  biaxial  stress.  It  is  for  the 
sake  of  comparison  with  the  biaxial  case  that  we  have  chosen  not  to  utilize  the 
fact  that  initially  0O  =  0,  which  earlier  was  shown  to  follow  from  (25). 

6.  TRANSFORMATION  OF  BIAXIAL  EQUATIONS.  The  transformation  of  coordinates 
for  the  biaxial  stress  case  is  composed  of  a  rotation  that  brings  the  coordinate 
axes  into  coincidence  with  the  principal  axes  of  the  yield  surface  ellipsoid 
followed  by  a  normalization  of  the  ellipsoid  to  a  unit  sphere.  The  transformation 
is  defined  by  the  equations 
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0 

so  that  t  and  n  (a  =  1,  2,  3)  are  the  transformed  components  of  E  „  and  Aa  , 
a  a  exp  otp 

respectively.  Under  this  transformation  the  yield  condition  (15)  becomes 


i  i.  z  1 

T1  +  T2  +  T3  =  1 


(44) 


On  comparing  (9)  and  (14)  it  easily  follows  that  the  discrete  flow  equation, 
which  is  the  biaxial  stress  counterpart  of  (18) ,  is  transformed  into  the  trio 


4  Tj  =  Hj  -  (1~<S)  t*  AA 

A  t?  =  n9  -  Tii  (45) 

A  r  =  ii,  -  r!  AA 

p  o  P 

where  x*  (a  =  1,  2,  5)  are  intermediate  values  as  given  by  (32),  where  AA  is 
a 

defined  by  (33),  and  where 


6=2—  (46) 

K 

Similarly,  the  continuous  flow  equation  for  biaxial  -stress  is  tranformed  into 
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dTl 

df 

dA 

dA 
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dA 

dA 

dT3 

df 

dA 

*  dA 

n,  -  (1-6)t1 


n,  -  t. 


where  the  limit  values  (36)  apply  with  a  =  1,  2,  3. 


(47) 


For  the  sake  of  comparison,  the  transformed  biaxial  equations  are  also 
summarized  in  Table  1.  It  is  clear  that  for  both  the  discrete  and  the  continuous 
equations,  the  triaxial  stress  equations  are  simpler  than  the  biaxial  stress 
equations,  there  being  only  two  equations  involved  and  these  being  more  symmetric. 
The  biaxial  equations  are  more  complex  because  of  the  occurrence  of  the  5  term 
which  as  we  can  see  from  (17)  and  (46)  is  caused  by  the  elastic  compressibility  of 
the  material.  In  fact,  when  the  material  is  incompressible,  so  that  v  =  \  and  hence 
6  =  0,  a  simple  coordinate  rotation  bringing  a  coordinate  plane  into  the  plane  of 
the  vectors  t°  and  n  transforms  the  three  biaxial  equations  into  the  two  triaxial 
equations. 


As  in  the  triaxial  stress  case,  because  the  yield  condition  for  the  biaxial 
case  (44)  corresponds  to  a  unit  sphere,  it  will  be  automatically  satisfied  when 
the  angle  coordinates  9  and  <j>,  as  given  by  equations 

=  cos  0  t?  =  sin  0  cos  <|>  =  sin  6  sin  <j>  (48) 


are  used.  Writing  vector  n  in  terms  of  angle  coordinates: 

rij  =  n  cos  a  n?  =  n  sin  a  cos  Y  =  n  sin  a  sin  y 

where 


n  - 


VI 


,,  e  .  e  .  e  e 
,  (Ac  _  Ao  +  Aa  Aa  ) 
a  aB  aB  aa  8B 


(49) 


(SO) 


and  substituting  these  and  the  previous  relations  in  the  continuous  flow  equations, 
a  pair  of  differential  equations  on  6  and  <f>  are  obtained: 


de 

df 


(1-6)  cos0  sina  cos(y->|>)  -  sin0  cosa 

2 

1-6  cos  0 


d9  sina  sin(y-<})) 

df  71  sine 


(51) 
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These  equations  have  so  far  resisted  efforts  to  find  an  exact  integral. 
Consequently,  a  simpler  case  of  the  biaxial  stress  equations  was  selected  for 
study  by  assuming  that  <f>  =  y  (=0  for  convenience),  so  that  only  the  single 
equation 


d9  (1-6)  cos  9  sin  a  -  sin  9  cos  a 

df  n  1-6  cos  9  ^  ^ 

needs  to  be  integrated.  This  equation  still  is  more  complex  than  its  triaxial 
counterpart  (39) ,  again  because  of  the  6  term.  Physically  it  corresponds  to  biaxial 
stress  situations  in  which  the  shear  component  of  stress  vanishes  (i.e.  = 

=  0),  as  for  example  happens  in  cylindrical  symmetric  thin  shell  problems. 

In  such  situations  the  last  of  the  discrete  biaxial  equations  (45)  is  trivially 
satisfied,  while  the  substitution  of  (48)  and  (49)  into  the  first  two  reduces 
them  to  the  single  equation 

sin  (9-9o)  -  6  (sin  9  -  sin  60)  [  (l-u>)  cos  90  +  w  cos  9] 

=  n  {(1-6)  sin  a  [  (1-w)  cos  90  +  u  cos  9]  (53) 

-  cos  a  [  Cl-w)  sin  90  +  w  sin  9]  } 

which  is  the  counterpart  of  (41).  Before  analyzing  these  equations,  we  replace 
the  angle  a  by  8,  defined  by  the  relation 

tan  8  =  (1-6)  tan  a  (54) 


so  that  (52)  and  (53)  become: 


(1-6  cos"  9) 


sin  (8-9) 


(55) 


and 


where 


sin  (9-90)  -  6  (sin  9-sin  90)  f  (l-to)  cos  90  +  <d  cos  0] 
=  n'  [  (1-w)  sin  (8-9o)  +  a)  sin  (8-9)] 


(56) 


n'  3  n  f  cos"  a  +  (1-6)"  sin*"  a]  '2  (57) 

Notice  that  if  6  =  0,  then,  as  expected,  (55)  and  (56)  reduce  to  the  triaxial 
equations  (39)  and  (41). 

The  solution  to  (55)  with  the  appropriate  limit  values  is  obtained  in  implicit 
form  as 


427 


(58) 


where 


tan 


8-6 

2 


tan 


9o-8 


e-H(9) 


H(8) 


n'  -  5  [cos  (8+8)  -  cos  (80+8)] 
1-6  cos^  6 


(59) 


7.  COMPARISON  OF  APPROXIMATIONS:  TRIAXIAL  STRESS.  In  this  section  we  present 
a  comparison  of  the  differences  between  the  exact  solution  to  the  triaxial  case 
(40)  and  four  particular  approximations  to  (41) : 

a.  Forward  difference  (w=0)  and  linearized  yield  (22)  (plus  radial  correction) 

b.  Forward  difference  (u>=0)  and  exact  yield  (21) 

c.  Central  difference  (cu=%)  and  exact  yield 

d.  Backward  difference  (w=l)  and  exact  yield 

We  have  already  noted  that  in  the  triaxial  stress  case  the  solutions  are  independent 
of  0O>  so  that  only  a'  and  n  need  be  varied,  see  Figure  5.  For  each  value  of  a' 
and  n  the  difference  A0  between  the  value  of  0'  computed  using  one  of  the  above 
approximations  and  the  value  using  the  exact  solution  will  be  determined.  Due 
to  symmetry,  we  need  only  consider  values  of  a'  between  0°  and  90°;  notice  that 
when  a'  =  0  both  the  exact  solution  and  the  approximations  give  the  trivial 
solution  0'  =o  for  all  values  of  n.  The  value  of  n,  as  we  can  see  from  its 
definition  (38)  or  from  Figure  S,  measures  the  ratio  of  the  magnitude  of  the 

increment  AS.,  to  the  yield  stress.  The  angle  8'  (in  radians)  can  be  regarded 

as  measuring  the  magnitude  of  the  stress  trajectory  on  the  yield  surface  relative 
to  the  yield  stress. 


The  solutions  for  the  four  approximations  can  be  obtained  by  geometric 
construction,  as  shown  in  Figures  6  through  9.  The  exact  solution  is  obtained 
from  (40) : 


9' 

tan  — 


(60) 


The  difference  between  the  value  of  0'  determined  from  an  approximation  and  the 
exact  value  is  defined  as  the  error: 


A0  =  0 


Approx 


0' 

Exact 


(61) 


Within  the  range  0°  <  a'  <  90°,  the  solution  9'  will  be  positive;  hence,  A0  will 
be  positive  or  negative  depending  on  whether  an  approximation  over  or  underpredicts. 
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2 


FORWARD  DIFFERENCE 


Ar=7?-J°AA 

r>* 

LINEARIZED  YIELD 
r°»AT'  =  0 

<v  <v 

RADIAL  CORRECTION 


tan  0*  =  Tj  sin  Ct{ 


Figure  6.  Solution  for  the  forward  difference  and  linearized  yield  condition 
approximation  with  radial  correction. 


FORWARD  DIFFERENCE 
AI=T7-I°AA 

/v 

EXACT  YIELD 

T  •  T  -  1 


sin®1  =  ^sinOt1 


Figure  7.  Solution  for  the  forward  difference  and  exact  yield  condition 
approximation. 
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Figure  8.  Solution  for  the  central  difference  and  exact  yield  condition 
approximation. 


Figure  9.  Solution  for  the  backward  difference  and  exact  yield  condition 
approximation. 


In  Figure  10  we  plot  the  errors  A0  at  5°  interval  in  a*  for  the  four  approximations 
with  r  =  10%  and  50%,  For  both  values  of  n  the  graphs  show  the  same  trends.  When 
the  exact  yield  condition  is  used,  both  the  forward  and  central  differences  overpredict 
over  the  entire  range,  while  the  backward  difference  underpredicts.  When  the 
linearized  yield  condition  and  the  forward  difference  is  used,  the  approximation 
overpredicts  over  most  of  the  range  except  near  the  end  as  atT  approaches  90°  where 
the  error  crosses  over  and  underpredicts.  Also,  except  for  the  central  difference 
approximation  which  achieves  its  maximum  error  at  the  end  of  the  range,  all  the  other 
approximations  reach  their  maximum  errors  more  or  less  near  mid-range.  It  is  clear 
that  over  the  entire  range  the  central  difference  approximation  is  clearly  superior, 
with  only  the  forward  difference/linearized  yield  approximation  giving  a  smaller 
error  in  the  neighborhood  of  the  cross-over  point. 

Table  2  gives  the  maximum  error  for  each  approximation  over  a  range  of  values 


Table  2.  Maximum  Error  as  a  Function  of  n 


Linearized  Yield 

Exact  Yield 

n 

Forward  Diff 

- ....  . 

Forward  Diff 

Central  Diff 

-  - 

Backward  Diff 

.05 

.000610 

.000632 

.000010 

-.000596 

.10 

.002379 

.002555 

.000083 

-.002271 

.15 

.005210 

.005844 

.000279 

-.004897 

.20 

.009007 

.010554 

.000657 

-.008329 

.25 

.013672 

.016739 

.001274 

-.012440 

.50 

.019111 

.024500 

.002181 

-.017116 

.55 

.025306 

.034043 

.003426 

-.022359 

.40 

.032135 

.045368 

.005050 

-.027999 

.45 

.039519 

.058597 

.007090 

-.033943 

.50 

. 047384 

.074306 

.009576 

-.040110 

of  n  between  5%  and  50%.  At  5%  the  central  difference  approximation  is  about  60 
times  more  accurate  than  any  of  the  other  approximations.  As  n  increases  to  50%, 
the  advantage  of  the  central  approximation  diminishes  to  an  accuracy  of  four  to 
eight  times  greater  than  its  competitors.  As  is  to  be  expected,  the  table  makes 
clear  the  benefits  of  subincrementing  for  any  approximation.  For  example, 
using  the  central  difference  approximation,  if  n  -  .5,  the  maximum  possible 
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error  can  be  .009576;  if  five  subincrements  are  used  so  that  n  =  *1  for  each  sub- 
increment,  the  maximum  possible  error  is  reduced  to  .000415;  and  if  ten  subincrements 
are  used,  the  maximum  error  is  further  reduced  to  .00020.  It  is  also  clear  that 
if  sub incrementing  is  automatically  performed  whenever  n  >  .05,  then  the  central 
approximation  will  retain  its  60  :  1  advantage  in  accuracy  over  the  other  approximations. 

8.  COMPARISON  OF  APPROXIMATIONS:  BIAXIAL  STRESS.  In  this  section  we  compare 
the  deviations  from  the  exact  solution  that  result  from  three  particular  approximations 
to  the  biaxial  stress  equations.  As  remarked  in  Section  6,  the  comparison  will  not 
be  for  the  general  biaxial  equations,  but  for  the  special  case  where  the  shear 
component  vanishes.  However,  this  comparison  should  provide  some  estimates  on 
the  magnitude  of  the  errors  connected  with  these  approximations  for  the  general  case. 

The  three  approximations  to  be  compared  are: 

a.  Forward  difference  (o>=0)  and  linearized  yield  (plus  radial  correction) 

b.  Forward  difference  (op0)  and  exact  yield 

c.  Central  difference  (w*%)  and  exact  yield 

The  equation  for  the  first  approximation  follows  from  equations  (21),  (43),  (45), 

(48),  (49),  and  (54)  after  considerable  manipulation  and  can  be  written  in  terms 

of  polar  coordinates  as  follows: 


tan  0' 


n*  sine* 


(62) 


1-6  cos  0O 

where  0'  and  n'  are  defined  by  (42)  and  (57)  and  where 

S'  =  6  -  6o  (63) 

with  S  given  by  (54) .  The  equations  for  the  second  and  third  approximations  follow 
from  equation  (56)  after. setting  a)  -  0  and  h,  respectively; 

[ Y(0 ' )] 2  -  2(1-6  cos260)  Y(0') 

+  n'  sin8'(n'  sinS'  -  2  6  sind0  cos0o)  =  0  (64) 

[Y(0')13  -  [2(1-6  cos2 0 o )  +  n'  cosg']  [Y(0')]2 

+  n'  sinS'  ( n '  sin  S'  -  4  6  sin0o  cos0o)  Y(0') 

-  (n'  sinS')2  [2(1-6  sin^Qo)  +  n'  cosS'l  =  0  (65) 

where  the  function  containing  the  unknown  0'  is  defined  as: 
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Y(e')  = 


(66) 


n'  sing' 
~  §T- 

tan  ry 


Notice  that  the  second  approximation  involves  solving  a  quadratic  equation  for  the 
unknown  and  the  third  involves  a  cubic.  Interestingly,  when  these  approximations 
are  implemented  in  a  computer  code  for  the  general  biaxial  stress  situation,  the 
equations  used  to  determine  the  flow  parameter  increments  AX  turn  out  to  be, 
respectively,  quadratic  and  cubic. 

The  exact  value  of  9'  to  which  the  above  obtained  approximate  values  will  be 
compared  is  found  by  solving  the  transcendental  equation  (derived  from  (58) 
and  (59)) 

26  *  X  .  .lco?g . ■+.  =  n,  +  (]__6  cos2 0)  In  b-  (67) 

1  +  X2  1  ♦  x02  Xo 

for  the  variable 

X  =  tan  (68) 

where 

Xo  *  tan  (69) 


This  transcendental  equation  is  easily  solved  on  the  computer  using  a  Newton-Raphson 
algorithm. 


As  in  the  previous  section,  the  error  is  defined  as  the  difference  between  the 
values  of  9'  obtained  from  each  of  the  three  approximations  and  the  exact  solution. 
Unlike  the  triaxial  case,  both  the  exact  and  the  approximate  solutions  will  depend 
on  the  initial  value  90,  so  that  now,  in  additon  to  a'  and  n,  9q  will  have  to  be 
varied  over  a  suitable  range.  Moreover,  due  to  the  0O  dependence,  the  equations 
no  longer  have  a  trivial  solution  when  a'  =  0  and  the  solutions  are  no  longer 
symmetrically  distributed  about  a'  =  0.  Hence,  a'  must  be  varied  over  the  range 
values  from  -90°  to  +90°,  while  due  to  symmetry  0O  need  only  vary  from  0  to  90°. 
Also,  the  trival  solution  is  obtained  when  g'  =  0,  corresponding  to  the  value 


,  ,  _  _  -1  6  sinQo  cosGo 

a'  *  a0  =  tan - ^ - 

1-6  cos** 9o 


(70) 


At  this  angle,  exact  and  approximate  solutions  change  sign.  Therefore,  to  insure 
that  overpredictions  will  be  positive  and  underpredictions  negative,  the  error 
needs  to  be  redefined  as 


A0 


9 ' 

Approx 


''  Exact1  sign(0'  Exact ; > 


(71) 
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With  this  definition  in  mind,  the  errors  resulting  from  solving  the  previous 
equations  at  10°  intervals  in  ctf  for  the  values  0O  =  0°,  45°,  90°  and  r\  =  10% 

(assuming  that  6  =  ™  )  are  plotted  in  Figure  11.  Comparison  with  the  n  =  10% 

graph  in  Figure  10  shows  that  the  magnitudes  of  the  errors  are  approximately  the 
same  and  that  the  same  general  trends  persist:  the  central  approximation  is 
clearly  superior  and  achieves  its  maximum  error  at  the  ends  of  the  interval 
(a f  =  ±90°),  while  the  linearized  forward  and  exact  forward  approximations  reach 
their  maxima  near  the  middle  range  (af  =  ±45°) .  We  also  see  the  effects  of  the 
dependence  on  the  initial  angle  0O  in  the  diminishing  errors  as  0O  :  0°+  90° 
and  in  the  shifting  of  the  A6  -  0  solutions  from  the  origin;  e.g.,  when  0O  =  45° 
the  trivial  solutions  occur  at  ai  =  11.3°,  but  at  the  values  0O  =  0°  and  90°  there 
is  no  shifting  due  to  symmetry. 

Figure  12  shows  the  graphs  of  the  errors  for  the  same  three  values  of  Go 
when  r,  -  50%.  The  general  trends  noted  before  are  still  present  with  the  central 
approximation  still  superior  and  comparison  with  the  n  =  50%  graph  in  Figure  10 
confirms  that  magnitudes  of  the  errors  are  still  close. 

The  comparison  of  the  magnitudes  of  the  errors  between  the  triaxial  and  the 
biaxial  cases  suggests  that  for  a  given  value  of  aT  the  triaxial  error  for  each 
approximation  is  the  average  of  the  biaxial  errors  over  the  range  of  0O.  This 
more  or  less  is  confirmed  numerically  in  Figures  13  and  14,  where  the  errors 
using  the  central  differences  approximation  with  n  =  10%  are  plotted  against  the 
angle  0O  for  values  of  a*  -  0°,  30°,  60°,  90°,  Each  graph  shows  how  the  error 

varies  over  the  range  -90°  0O  90°  for  5  -  i  (representing  a  genuine  biaxial 

situation)  and  for  5=0  (corresponding  to  the  triaxial  situation,  as  remarked 
at  the  end  of  Section  7).  We  see  that  except  for  the  case  where  a'  -  0,  the 
errors  in  the  biaxial  case  do  indeed  tend  to  cluster  about  the  error  in  the  triaxial 
case.  Hence,  the  errors  computed  using  the  simpler  triaxial  stress  equations 
should  provide  good  estimates  on  the  magnitudes  of  the  errors  to  be  expected  when 
using  biaxial  stress  approximations.  As  further  confirmation  of  this  supposition, 
we  present  in  Table  3  for  a  range  of  values  of  0O,  the  maximum  errors  for  the  three 
approximations  considered  here  when  n  =  10%.  Comparison  of  these  errors  with  the 
maximum  errors  in  Table  2  when  n  -  10%  shows  that  the  errors  are  of  the  same  order. 


Table  3.  Maximum  Error  as  a  Function  of  0O  using  the  Biaxial  Stress  Approximations 
with  n  "  . 10 , 


00 

Linearized  Yield 

Exact  Yield 

Forward  Diff 

Forward  Diff 

Central  Diff 

0° 

.003557 

.003782 

.000123 

15° 

.003257 

.004057 

.000128 

!  30° 

.002920 

.003776 

.000117 

45° 

.002279 

.003110 

.000095 

;  60° 

.001906 

.002448 

.000074 

75° 

.001670 

.001969 

.000061 

90° 

.001540 

.001718 

.000056 
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Figure  11.  Graphs  of  the  error  A6  for  the  biaxial  approximations  versus  the  angle 
a'  for  three  values  of  0  when  n  =.10. 
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Figure  13.  Graphs  of  the  variation  in  the  errors  between  the  triaxial  (6  =  0)  and 
biaxial  (6  =  y)  central  difference  approximation  for  a*  =  0°  5  30°. 
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Figure  14.  Graphs  of  the  variation  in  the  errors  between  the  triaxial  (<5  *  0)  and 
biaxial  (6  =  -=•)  central  difference  approximation  for  a'  =  60°  §  90°. 


439 


9.  SUMMARY  AND  CONCLUSION.  The  Prandtl-Reuss  equations  for  a  linear  strain 
hardening  material  have  been  integrated  exactly  for  a  prescribed  discrete  strain 
increment  for  the  cases  of  biaxial  stress  and  triaxial  stress.  A  comparison  of 
these  solutions  with  a  number  of  approximations  to  these  equations 
commonly  used  in  response  programs  has  been  performed.  The  comparison  shows  that 
the  central  finite  difference  approximation  to  the  flow  rule  in  combination  with 
the  exact  yield  condition  is  the  most  accurate.  For  example,  the  maximum  error 
using  this  approximation  will  be  in  the  order  of  .01%  of  the  yield  stress  for  an 
elastic  stress  increment  equal  to  10%  of  yield.  The  comparison  also  enables  us 
to  quantify  the  beneficial  effects  of  subincrementing  in  plasticity  approximations. 
Hence,  if  subincrementing  is  automatically  employed  whenever  the  elastic  stress 
increment  exceeds  5%  of  yield,  then  the  maximum  error  for  an  elastic  increment  as 
great  as  50%  of  yield  will  be  limited  to  approximately  .01%  of  yield.  Also, 
the  comparison  shows  that  the  simpler  triaxial  stress  analysis  provides  good 
estimates  on  the  errors  to  be  expected  with  the  equivalent  biaxial  approximations. 

The  central  finite  difference  approximation,  which  strangely  enough  is  little 
used,  has  been  recently  implemented  in  the  ADINA  response  program  (ref.  3  and  7) 
for  the  case  of  triaxial  stress  and  will  soon  be  programed  for  the  biaxial  stress 
case.  It  is  also  planned  to  implement  the  central  difference  approximation  in 
the  REPSIL  (ref.  8)  and  in  the  PETROS  (ref.  9  and  10)  series  of  shell  response 
programs  at  the  earliest  opportunity. 

An  intriguing  question  that  might  have  occurred  to  the  reader  is  why  not 
implement  the  exact  solution.  In  the  triaxial  stress  case  where  the  solution  (60) 
is  explicit  thei-e  is  little  doubt  that  it  would  be  more  effective,  not  so  much 
in  improving  accuracy,  for  the  central  approximation  is  very  accurate,  but  in 
obviating  the  need  for  subincrementing.  Hence,  the  implementation  of  the  exact 
triaxial  stress  solution  is  one  item  in  future  plans. 

As  for  implementing  the  exact  solution  in  the  biaxial  stress  case,  the  main 
objection  is  that  the  solution  is  for  a  special  plane  stress  situation  in  which 
there  is  no  shear  stress  component.  Hence,  the  solution  cannot  be  utilized  in 
most  programs  that  analyze  plane  stress  or  Kirchhoff  shell  problems.  Moreover, 
the  solution  is  implicit  and  requires  some  numerical  scheme,  such  as  Newton-Raphson, 
to  obtain  an  answer;  hence,  there  may  be  little  to  choose  in  terms  of  efficiency 
between  solving  for  the  implicit  exact  solution  and  using  the  central  difference 
approximation  with  sub incrementing. 
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ABSTRACT 


This  paper  describes  two  discrete-time  L Q  regulator  algorithms  which  have 
been  implemented  and  tested  on  a  microcomputer-based  servo  control  system  lo¬ 
cated  in  the  Stabilization  Research  Laboratory  facility  at  ARRADCOM,  These 
algorithms  include  reduced-order  observers  for  estimating  system  states  and 
disturbances,  and  LQ-based  digital  control  laws  for  precision  stabilization  in 
the  presence  of  external  disturbances. 


1.  INTRODUCTION 


Modern  control  theory  and  digital  microprocessors  represent  two  emerging 
technologies  which  have  significantly  altered  the  perception  of  the  gun  pointing 
and  stabilization  problem  from  that  of  an  isolated  subsystem  design  process  to 
one  in  which  all  system  state  variables  and  error  sources  (including  target- 
induced  errors  [1])  are  considered  in  the  gun  control  law  development.  Refer¬ 
ence  [2] ,  in  particular,  illustrates  the  performance  advantages  associated  with 
modern  gun  control  law  design.  In  this  example,  an  LQ  controller  was  designed, 
implemented  and  evaluated  in  live  firing  tests  on  a  helicopter  armament  system. 
The  dispersion  associated  with  the  modern  control  design  was  1.26  mr  as  compared 
to  4.2  mr  for  the  original  design.  The  control  law  implementation  in  this 
effort  was  performed  using  fixed  configuration  analog  electronics  which  imposed 
practical  constraints  in  terms  of  exploiting  the  full  benefits  of  modern  obser¬ 
ver  theory  and  disturbance  accomodation  methodology-  These  constraints  are, 
for  the  most  part,  eliminated  by  performing  the  control  law  implementation  in 
software  on  a  microcomx>uter-ba sed  controller.  The  advantages  in  this  approach 
include  reduced  hardware  complexity,  cost,  increased  design  flexibility,  common¬ 
ality  and  grown  potential,  as  well  as  improved  system  performance. 

The  discrete-time  LQ  regulator  designs  presented  in  this  paper  represent 
the  first  of  a  series  of  microcomputer-based  control  concepts  which  will  be 
subjected  to  extensive  laboratory  testing,  followed  by  implementation  and 
evaluation  of  the  XM-97  Helicopter  Turret  System,  discussed  in  [2].  One  of 
the  designs  discussed  includes  a  four-state  observer  for  estimating  and 
suppressing  recoil  torque  disturbances.  This  design  is  similar  to 
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the  one  proposed  in  [2],  but  not  implemented,  due  to  the  limitations  of  the 
analog  electronics  used  for  the  control  law  implementation. 


2.  DESCRIPTION  OF  LABORATORY  TEST  FIXTURE 


The  inertia  wheel  test  fixture  used  to  evaluate  the  discrete-time  regula¬ 
tor  algorithms  developed  in  this  paper  consist  of  two  DC  torque  motors  which 
drive  the  inertial  wheel,  an  800  Hz  preamp,  a  demodulator,  and  a  torque  drive 
amplifier.  The  testing  facility  and  a  representative  block  diagram  of  the  test 
fixture  are  shown  in  Figs,  la  and  lb.  Note  that  one  torque  motor,  denoted  by 
subscript  1,  is  used  for  regulation  and  tracking  while  the  second  troque  motor 
(denoted  by  subscript  2)  is  used  to  generate  torque  input  disturbances.  A 
listing  of  the  mo tor /inertia  wheel  parameters  is  given  in  Table  1. 

For  practical  reasons  we  neglect  the  armature  inductance  in  the  motors' 
model.  The  open-loop  state-space  representation  of  the  inertia  wheel  test 
fixture  is  then  given  by: 
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where  0  and  0  represent  motor  shaft  angular  motion  and  velocity ,  f  is  coulomb 
friction  (modeled  as  a  constant  bias  disturbance) ,  and  T2  is  an  external  torque 
disturbance.  In  our  work,  T2  simulates  recoil  disturbance,  and  we  replace  this 
torque  notation  with  r(t).  In  general,  recoil  disturbance  is  modeled  as  a 
damped  sinusoid*  Because  of  practical  constraints,  we  simplify  this  model  to 
a  pure  sinusoid  with  frequency  (1)  *  The  equation  representing  the  disturbance 
states  is  therefore, 


<2~2) 


Note  that  only  the  position  is  available  for  online  measurement. 


3-  CONTROL  LAW  SYNTHESIS 
3,1  Problem  Formulation 

The  problem  is  restated  here  in  a  form  amenable  for  the  control  law 
synthesis,  we  are  given  the  dynamic  system  represented  in  the  state-space  as 
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(3-1) 
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Figure  la.  Testing  Facility  at  ARRADCOM 


Figure  lb.  Two  Torque  Motor  Diagram  of  Inertia  Wheel. 


TABLE  1.  MOTOR/INERTIA  WHEEL  PARAMETERS 


Motor  1 

Motor  2 

Units 

Back  emf/ 

0.347 

0*54 

vdc/rad/sec 

Torque, 

0.  257 

0.4 

ft-#/amp 

Armature  Resistance,  R 

28 

55 

ohms 

Armature  Inductance,  L 

0. 0125 

0.025 

henry 

Armature  Time  Constant  T  _  L 

1 

1 

sec 

(Measured) ,  R 

1500 

1500 

Torque  at  30V 

0.  218 

0.258 

ft-# 

Maximum  Friction,  T^ 

0*1 

0.05 

ft-# 

Maximum  unbalance 

0*  1 

0.1 

ft-# 

Wheel  Inertia,  J 

0*4 

16 

in-#~sec2 

CM 

.h 

II 

:  ^ 

in 

* 

It 

— * 

o 

out  j 
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(3-1) 


*2  = 

y>  cxi- 

where  x  represents  the  plant  state,  x2  the  torque  disturbance  state,  u  A eV2 
control  Signal ,  and  y  the  plant  output.  The  state  variables  are  as  follows: 

x’  -  [6,6] 

-  (r  r  f] 

motor  shaft  angular  displacement  (gun  pointing  angle) 
motor  shaft  angular  velocity 
recoil  torque  disturbance 
recoil  torque  rate 

coulomb  friction  and  any  other  torque  bias  disturbance 

The  matrices  A],  A2 ,  &nd  bj  are  as  per  Eqs.  2-1  and  2-2,  and  F  - 
c  -  [Kout  0]  . 

The  simplified  plant/disturbance  configuration  is  shown  in  Fig.  2. 
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Figure  2.  Plant/Torque  Disturbance  Configuration. 


Several  remarks  are  pertinent: 

1.  The  back  emf  gains  and  are  combined  into  a  single  gain. 
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^-■Vv 


K. 


T. 

2  “2 


(3-3) 
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2>  The  linearized  mode]  (Kq.  3-1)  requires  that  the  coulomb  friction 
term,  f ,  be  represented  as  a  simple  torque  bias  term,  independent 
of  0. 

3.  For  our  design,  we  consider  the  power  amplifier  gain  K^n-12,  and  the 
resolver  gain  K^^IO  as  a  part  of  the  physical  plant. 

The  design  problem  now  is  twofold: 

1,  Given  the  plant  output  measurement,  y(t),  obtain  an  estimate  of 
the  state  x i  and  X2 - 

2 *  Design  a  control  law  that  would  regulate  the  pointing  angle,  9(t), 

in  face  of  any  persisting  torque  disturbances*  In  other  words,  the 
control  law  requirement  is  to  null-out  X2  and  to  regulate  the 
plant  state  x l - 

We  consider  first  the  control  law  design. 


3.2  Control 


It  is  convenient  to  synthesize  the  control  law  first,  and  then  to  proceed 
with  the  estimation  scheme.  The  usual  practice  is  to  rewrite  the  system  (Eq. 
3-1)  with  the  augmented  plant/di strubance  dynamics  state 


such  that 
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Upon  substitution  of  the  parameter  values  of  Table  1,  and  with  the  ''recoil" 
frequency  of  10  Hz  (w^  =  62.83  rad/sec),  one  obtains 
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(3-7) 


We  obtain  now  the  feedback  control  law,  u  =  -K^,  by  a  straightforward  applica¬ 
tion  of  optimal  control  theory.  The  optimal  gains,  Kc,  are  derived  by  minimi¬ 
zing  the  cost  functional 
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J(u)  f  (x'Qx-iu'  Ku)dt  ,  Q  >_  0  ,  R  >  0  .  (3-8) 

0 

where  R-l  and  Q-diag [5 • 104  0  0  0  0].  This  choice  yields  the  continuous- 

time  gains 

K  -  [223.6  11.8  1.851  0.084  -9.07]  (3-9) 

c 

which  yield  the  closed-loop  pole  locations  for  6,6  at  -18  ±  jl8. 

However,  since  our  design  requires  a  discrete-time  control  law,  we  reformu¬ 
late  our  optimal  control  problem.  This  process  is  carried  out  in  two  steps,  as 
indicated  in  the  appendix.  It  yields  the  equivalent  discrete-time  system 

*k«i  -  %  *  E"k  l3-10) 


and  with  a  sampling  interval  AT  =  0.01  sec, 
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(3-11) 

The  equivalent  discrete-time  control  law  is  then 

\  =  ’Vk  =  -[182.8  10.72  1.563  0.0775  -9.  07]^  (3-12) 


3 . 3  Estimation 

Implementation  of  the  control  law  (Eq.  3-12)  requires  that  an  estimate  be 
made  of  the  state  This  estimate  is  made  using  a  discrete-time,  reduced- 

order  observer,  with  the  equations  summarized  in  the  appendix,  this  design  pro¬ 
cedure  yields  an  observer  state,  z^,  which  evolves  according  to 

\+i =  A^k  +  A2yk +  A3Uk  '  (3~13) 


and  the  state  estimate  is  given  by 


A  z. 
z-k 


+  A  y 
— y  k 


(3-14) 
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A  necessary  constraint  in  this  desi  gn  is  the*  observer  bandv.vi dth,  as  it  must 
not  exceed  the  Nyquist  sampling  rate. 

In  this  paper,  we  treat  two  basic  cases: 

1*  The  "recoil"  disturbance  is  not  operative,  and  the  only  torque 
that  disturbes  the  gun  is  coulomb  friction*  The  observer  needs 
to  estimate  0  and  f  only, 

2,  All  disturbances  are  operative;  the  observer  estimates  6,  r, 
r,  and  f. 


3.3.1  Two-State  Observer  —  Since  the  recoil  dynamics  can  be  eliminated  from 
the  state  Eq*  3-10,  we  obtain  the  second -order  observer  which  estimates  the 
angle-rate  and  friction  terms: 
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The  design  variables  Q  and  R  (see  appendix)  were  chosen  as 

R  "  1  ,  Q  =  diag (0, 100)  * 


(3-15) 


(3-16) 


and  the  observer  poles  are  at  0.64  ±  j25.  The  selection  of  R  and  Q_is  not 
arbitrary.  If  we  assume  that  the  observer  (Eg*  3-13)  has  a  continuous-time 
equivalent,  it  would  be  desired  to  locate  the  poles  of  such  an  observer  to 
the  left  of  the  (continuous-time)  optimal  controller  closed-loop  poles.  The 
equivalent  observer  poles  are  taken  as  the  eigenvalues  of  the  matrix  (InA^/AT 
and  for  the  chosen  R  and  Q  they  are  -38  ±  j38. 


3.3.2  Four-State  Observer  —  We  design  now  a  reduced-order  observer  which  esti 
mates  all  four  unmeasured  states:  0,  r,  r,  and  f.  Three  design  cases  are  con 
sidered  ;  in  subsection  3. 4  we  compare  these  designs  by  evaluating  the  closed- 
loop  disturbance  rejection  properties*  In  all  three  designs  we  use  R~1  and 
vary  Q.  The  equivalent  continuous-time  observer  poles  and  the  respective  Q 
values  are  given  in  Table  2.+ 

3*4  Closed-Loop  Disturbance  Rejection  Properties 

Figure  3  shows  the  closed-loop  configuration,  where  the  torque  T  repre¬ 
sents  any  unmodeled  disturbance  which  may  excite  the  gun  inertia.  It  is  of 


* 


t 


This  matrix  may  be  computed  by  its  Taylor  series  expansion* 
The  observer  numerical  values  are  obtained  in  the  same  strai 
as  those  of  the  two-state  observer,  and  arc  not  given  here. 


ghtforward  manner 
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rmnnr  interest  to  evaluate  the  capability  of  the  control  algorithm  to  suppress 
such  torque  disturbance  at  the  gun  output ,  6.  simultaneously/  one  must  ensure 
appropriate  loop  stability  margins,  It  is  required,  therefore/  to  examine  the 
closed-loop  frequency  response* 


TABLE  2*  EQUIVALENT  OBSERVER  POLES 


Observer  Poles 
(Continuous-Time  Equivalent) 

I 

II 

III 

Q=diag(0,102,0,102) 
Q=diag (0, 1, 0, 103) 
Q=diag(0,7,0,104) 

-20±j62,  ~32±j39 

-l±j63,  -66±j70 

-0. 9±j63,  -137± jl41 

Figure  3,  Closed-Loop  Configuration* 


One  can  represent  the  closed-loop  by  an  n+(n~m)-th  order  state  equation 
[3],  viz., 
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In  order  to  facilitate  the  frequency  analysis  we  transform  Eq.  3-17  into  a 
continuous-time  equivalent  via  a  logarithmic  transformation,  as  indicated  in 
subsection  3*3.1.  The  unmodeled  torque  disturbance,  T,  is  then  appended,  re¬ 
sulting  in  the  state  equation 
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CL 
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ACL  ■  AT  ln  *CL  ' 


(3-18) 
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where  L*  --  [0  1/3  0  0  0  !  0*  j.  The  transfer  function  of  interest  is 

—T  1  — n-m 

»(s>  =  c  ( sl-A  )~1b„,  (3-19) 

T  O  CL  — T 

where  C  =  (1  0  0  0  0  j  O'  ) . 

o  1  — n-m 

Bode  plots  of  1 6  (s)/T(s)  |  are  shown  in  Figs.  4  through  7.  Figure  4  shows 
the  simple  design  case  for  which  the  recoil  disturbance  is  inoperative 
(subsection  3.3.1).  Two  cases  are  presented: 

1.  The  friction  estimate  is  fed-back,  and  the  discrete  control 
gains  are 


Kd  =  [182. B  10.72  -9.07] 


(3-20) 


2.  The  friction  estimate  is  not  used  in  the  feedback  loop, 
i.e.,  the  r  gain  (-9.07)  is  set  to  zero. 

Case  1  is  characterized  by  a  low  frequency  gain  droop  which  demonstrates  the 
closed-loop  capability  to  cancel  out  low  frequency  (bias)  disturbances.  This 
is  not  the  case,  however,  when  the  friction  term  is  excised  from  the  feedback 
path  as  shown  in  Fig.  4. 

Gain  and  phase  margins  are  obtained  by  breaking  the  loop  at  u,  and  by 
computing  the  u-to-u  open-loop  transfer  function.  For  case  1  the  margins  are 
(°°,5Q0)  and  for  case  2  (12  dB,45°). 

The  full  recoil  disturbance  case  is  presented  in  Figs.  5  through  7.  Shown 
are  the  three  design  cases  (I  through  III)  as  indicated  in  Table  2.  Several 
comments  are  in  order: 

1.  The  three  designs  exhibit  a  notch  at  (or  just  about)  the 
recoil  frequency  (10  Hz). 

2.  The  notches1  depth  and  width  vary  considerably*  Table  3 
lists  the  uncancel.led  poles  and  zeros  of  |0(s)/T(s)|  for 
all  three  cases;  it  is  evident  that  the  proximity  of  the 
complex  zero  pair  to  the  disturbance  poles  (located  on  the 
imaginary  axis)  controls  the  notches*  depth. 

3.  The  stability  properties  of  the  closed  loop  are  discussed  in 
detail  in  [4] ;  the  gain  and  phase  margins  are  (12  dB,4  0°)  in 
case  I,  (11  dB,  50°)  in  XI,  and  (10  dB,  60°)  in  III. 


TABLE  3.  | 6 (s)/T(s) |  UN CANCELLED  POLES  AND  ZEROS 


Case 

Poles 

Zeros 

I 

-20± j  62,  -32± j  39,  -18±jl8 

-5±j67,  -0.1,  -215 

II 

-1+ j63,  -66± j  70,  -18± j 18 

-0.25+j63,  -0.3,  -401 

III 

-0. 91 j63,  -137+jl41,  -181 jl8 

-0.  051  j€>3,  -0.7,  -7450 
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4.  MICROPROCESSOR  HARDWARE  DESCRIPTION 


Microprocessor  hardware/software  development  facilities  used  in  implemen¬ 
ting  and  evaluating  the  discrete  linear  quadratic  control  algorithms  discussed 
in  this  paper  are  shown  in  Fig,  8*  The  host  computer  shown  is  an  upgraded  8080- 
based  MD5  220  development  system  with  the  memory  expanded  to  64K  bytes  RAM  and 
1*25  megabytes  disk.  In  addition,  there  is  a  high-speed  line  printer,  an  8- 
channel  12-bit  A/D,  a  4-channel  12-bit  D/A  and  a  PROM  programmer  for  2708, 

2716,  and  2732  PROMS  (8,  16,  and  32K  bits  per  unit,  respectively).  Because 
the  8080  requires  1.2  milliseconds  to  perform  a  32-bit  floating  point  multiply, 
a  means  of  speeding  up  computation  was  required  in  order  to  complete  the  con¬ 
trol  algorithm  confutation  within  the  required  0.01  second  sample  time.  There¬ 
fore,  a  SBC  310  high-speed  mathematics  board  was  added  which  does  the  same 
multiply  in  85  milliseconds.  However,  this  board  must  communicate  with  the 
CPU  via  the  system  bus  in  order  to  store  and  then  load  the  required  four  byte 
data  words.  This  process  requires  approximately  90  milliseconds*  To  minimise 
this  excess  overhead  time,  another  SBC  310  high-speed  mathematics  board  was 
added  which  permits  one  board  to  compute  while  the  other  is  storing  data* 


Figure  8*  Hardware/Software  Facilities. 

A  diagram  of  the  inertia  wheel  interfaced  with  the  MDS  220  Microprocessor 
Development  System  is  shown  in  Fig.  9.  The  signal  generator  is  used  to  drive 
Motor  No.  2,  which  in  turn  generates  disturbances  to  the  wheel-  The  micro¬ 
computer  system  controls  Motor  No.  1  which  stabilizes  the  wheel  in  the  pre¬ 
sence  of  disturbances  induced  by  Motor  No.  2. 


454 


bach  preamp  /power  amp  combi  nation  is  capable  of  supplying  a  total  of 
±20  volts  comfortably  at  stall  current,  The  demodulator  is  a  standard  analog 
design.  The  output  of  this  device,  has  very  low  ripple;  however ,  this  repeti¬ 
tive  noise  is  sufficient  to  present  stability  problems  with  some  high  bandwidth 
observer/controller  designs-  In  order  to  minimize  the  effects  of  this  measure¬ 
ment  noise,  a  digital  demodulation  device  was  developed  to  replace  the  analog 
demodulator.  A  detailed  discussion  of  this  device  will  appear  in  a  separate 
paper  - 


Figure  9*  Inertia  Wheel  —  Computer  Interface. 
5.  SOFTWARE  IMPLEMENTATION 


The  microcomputer  program  which  implements  the  discrete  LQ  regulator 
algorithms  is  partitioned  into  two  parts,  as  shown  .in  Fig.  10-  The  first 
section,  written  in  FORTRAN,  converts  decimal  numbers  to  the  32-bit  floating 
point  format  required  by  the  high-speed  mathematics  boards-  The  second  part 
is  written  in  assembly  language  and  executes  the  algorithm  for  the  controller 
and  the  observers - 

Originally,  the  basic  design  program  (without  the  recoil  disturbance)  was 
written  in  FORTRAN.  However,  the  program  required  54  milliseconds  to  complete 
each  iteration.  Since  the  control  law  design  requires  that  each  interaction 
be  completed  within  10  milliseconds,  recoding  was  necessary  to  speed  up  the 
computation.  This  was  accomplished  by  recoding  the  algorithms  in  sequential 
assembly  language  statements,  using  macro  definitions.  This  modification  re¬ 
duced  the  program  execution  time  from  54  milliseconds  to  4.5  milliseconds. 

The  iteration  time  for  the  recoil  torque  observer  design  was  reduced  in  similar 
fashion  to  11.5  milliseconds-  To  further  reduce  the  iteration  time  below  10 
milliseconds,  error  check  routines  were  deleted  resulting  in  an  iteration  time 
of  8  milliseconds.  Table  4  summarizes  the  execution  times  associated  with  each 
implementation.  Further  reductions  in  execution  time  require  the  use  of  a 
f caster  micro  such  as  the  8086/8087. 
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Figure  10.  Algorithm 
Flowchart 


TABLE  4.  EXECUTION  TIMES  (FLIT) 
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..  i  nee  assembly  language  was  needed  for  algorithm  implementation,  two  user 
macros,  TUT  and  GET  and  one  submacro,  W7iIT,  were  defined.  Figure  11  defines 
each  macro  and  gives  the  expanding  sequence  for  each  parameter*  Figure  11  also 
defines  the  individual  terms  comprising  the  macros.  The  flow  charts  for  each 
macro  are  provided  in  Fiqs.  12,  13,  and  14.  It  should  be  noted  that  these 
macro  routines  are  not  called  like  a  subroutine,  but  are  M expanded."  When  the 
macros  were  coded,  all  required  variations  were  included  in  the  routine  or 
"defined."  When  the  programmer  references  a  defined  macro,  he  types  its  name, 
followed  by  a  list  of  parameters  enabling  selection  of  various  parts  of  the 
routine  and  modifying  the  internal  variables,  thereby  customizing  the  macro 
for  that  particular  use.  This  expanded  version  is  then  inserted  into  the  main 
program,  by  the  assembler,  where  its  reference  existed.  This  permits  program- 
ming  time  to  be  minimized  while  maintaining  maximum  execution  time  by  utilizing 
sequential  coding  to  eliminate  subroutine  call  times. 

When  the  PUT  macro  is  expanded,  the  I/O  port  address  and  memory  base 
address  are  first  obtained  for  the  selected  board.  The  WAIT  submacro  is  then 
expanded.  If  required,  four  bytes  of  data  for  each  address.  Data  1  and  Data 
2,  are  then  stored  in  the  board  memory*  Finally,  if  required,  the  type  of  opera¬ 
tion  to  be  performed  is  then  transferred  to  the  board  via  the  port* 

When  the  GET  macro  is  expanded,  the  I/O  port  address  and  memory  base 
address  are  first  obtained  for  the  selected  board*  The  WAIT  submacro  is  then 
expanded.  Finally,  if  required,  four  bytes  of  data  are  loaded  from  the  boards 
and  stored  at  address  TMP. 

When  the  WAIT  submacro  is  expanded  within  either  macro  PUT  or  GET,  if  a 
0,  1,  2,  or  3  is  designated  as  the  substitution  for  variable  WTERR,  execution 
is  halted  until  the  predesignated  math  board  has  completed  its  operation.  If 
a  1  or  3  is  designated,  the  CPU  corrects  for  overflow  (OF)  or  underflow  (UF)* 
Finally,  if  a  2  or  3  is  designated,  the  error  code  is  checked,  and  if  other  than 
an  OF  or  UF  error  is  found,  the  CPU  prints  the  type  and  location  of  the  error 
(program  point  counter)  on  the  system  console,  then  terminates  execution* 

The  first  routine,  FDATC,  was  kept  totally  in  FORTRAN.  The  function  of 
this  program  is  to  convert  numbers  in  standard  decimal  format  to  32-bit  floating 
point  format.  The  program  flow  is  shown  in  Fig.  15  and  operates  as  follows. 

This  FORTRAN  routine  calls  an  assembly  routine  (in  the  main  program)  which  sets 
up  a  storage  table  for  32-bit  (4-bvte)  words.  Regaining  control,  FDATC  then 
passes  the  starting  address  of  each  data  word  to  another  assembly  subroutine 
for  storage  in  the  table.  This  procedure  is  then  repeated  for  storing  the 
con  slants* 

Referring  to  Fig.  16,  the  input  routine  which  controls  the  A/D  converter 
is  called  ADIN.  Since  the  wheel  is  moved  only  a  short  distance,  the  voltage 
differential  to  the  A/D  converter  is  small.  To  maximize  resolution  of  the 
A/D,  a  programmable  preamplifier  is  keyed  by  the  program.  If  the  programmer 
sets  the  SCLD  flag  to  0,  the  amplifier  gain  is  1*  If  the  flag  is  1,  a  gain 
of  8  is  programmed  and  then  rescaled  back  down  in  software  to  prevent  satura¬ 
tion  of  the  A/D*  If  a  large  wheel  displacement  occurs,  a  third  option  is 
available?  i.e.,  SCLD  ~  3,  which  causes  the  A/D  to  sample  once  with  a  gain  of 
one,  choose  the  correct  gain  and  then  sample  as  before. 

The  second  part  of  ADIN  was  written  to  try  to  decrease  noise  at  the  algo¬ 
rithm  input.  If  flag  hVG-0 ,  then  one  sample  is  used  for  the  data*  If  AVG/0, 
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5; even  additional  samples  arc  taken  quickly  and  averaged.  This  averaging 
routine  takes  about  1.5  ini  11  i seconds  and  even  though  this  caused  a  slight  phase 
shift,  it  was  not  great  enough  to  effect  algorithm  performance.  However ,  since 
no  noticeable  improvement  in  system,  response  was  observed  using  the  averaging 
routine,  it  was  deleted  in  the  implementation  of  the  recoil  case  design. 

Finally,  the  last  section  of  ADIN  converts  the  12-bit  fixed-point  input 
from  the  A/D,  to  32-bit  floating  point.  A  problem  which  arises  in  this  process 
is  that  when  a  16-bit  fixed-point  zero  (00  00)  is  converted  to  a  32-bit 

floating  point,  the  high-speed  math  board  does  the  conversion  but  flags  an 
error.  Therefore,  before  each  conversion,  if  the  data  is  zero,  a  32-bit  zero 
is  forced  as  the  return  data. 

To  obtain  the  correct  sampling  rate  (0.01  sec)  the  entire  computation 
cycle  must  be  completed  in  less  than  0.01  sec  followed  by  a  delay  routine. 

Figure  17  shows  the  two  systems  of  delay  routines  used.  The  first,  DLYS, 
counts  down  a  32 -bit  number  to  implement,  a  software  delay.  DLY2,  on  the  other 
hand,  waits  until  the  A/D  is  triggered  forming  a  hardware  delay.  Both  routines 
contain  code  to  invert  the  polarity  of  a  5-volt  square  wave  from  a  spare  D/A 
channel  for  timing  the  full  sampling  rate;  i . e . ,  TIME  =  1/2  period  (square  wave) 

The  three  error  routines  are  shown  in  Fig.  IB.  If  the  error  is  other  than 
OF  or  UF,  HSME  prints  out  the  type  of  error,  where  in  the  program  the  error 
occurred  (point  counter)  and  the  exits  to  give  control  to  the  system  monitor, 
terminating  execution.  Since  the  math  boards  have  no  extended  precision  tem¬ 
porary  numbers,  when  an  overflow  occurs,  the  board  computes  the  correct  number, 
but  subtracts  BE}$  from  the  exponent.  To  minimize  this  error,  ERF 3  sets  the 
value  to  maximum. 

ERR4  does  the  same  but  with  an  underflow  error,  since  a  BE} &  was  added  to 
the  exponent  in  this  case.  In  addition,  both  ERR3  and  and  ERK4  output  a  narrow 
pulse  to  the  square  wave  timer,  a  +10  volts  for  overflow  and  a  -10  volts  for 
underflow,  to  enable  observation  of  these  two  errors. 

In  a  continuing  effort  to  minimize  execution  time,  the  program  wall  even¬ 
tually  be  run  on  an  8086  microprocessor  with  an  8087  coprocessor  for  floating 
point  computation.  The  8086  runs  off  a  5-megahertz  clock  and  has  a  six  state¬ 
ment  cue  for  increased  speed  with  consecutive  statement  coding  (600  ns  for 
minimum  statement) .  With  the  8087,  a  32-bit  floating  point  multiplication 
takes  only  18  microseconds,  and  since  it  operates  as  a  coprocessor,  the  over¬ 
head  is  only  26  microseconds.  Also,  the  8087  performs  double  precision  64- 
bit  computation  and  has  an  intermediate  temporary  storage  register,  80  bits 
wide,  to  eliminate  the  overflow  and  underflow  problem. 

6.  PRELIMINARY  TESTING  RESULTS 


As  indicated  in  subsection  3.4,  the  torque-to-pointing  angle  transfer 
function,  |0(s)/T(s)|,  is  of  major  interest  in  evaluating  the  design  perfor¬ 
mance.  Experimentally,  it  is  convenient  to  obtain  y  (s)/^  (s)  (see  Fig.  1)  as 
it  (approximately)  scales  with  [G(s)/T(s)|.  (Because  of  the  back  emf  effect 
it  will  not  scale  exactly.)  Figures  19  through  22  show  the  Bode  plots  of 
y(s)/eV2(s)  for  the  two-  and  four-state  observer  designs; 

1.  Figure  19  shows  the  case  in  which  the  friction  estimate  is 
fed  back  (two  state  observer). 
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Figure  17.  Delay  Routines. 


Figure  18.  Error  Routines. 


Figure  19. 


dB 

+40- 


o  1 


-40  J 


Figure  20* 


2.  Figure  20  is  tie1  case  where  £  is  not  used  in  the  feedback 
path  ( t wo - s t a t e  obse rvc r )  . 

3.  Figures  21  and  22  correspond  to  the  four-state  observer 
design  (cases  I  and  III  of  subsectidn  3.3. 2,  respectively) 
with  the  friction  estimate  excluded  from  the, control  law 
(one  can  detect  the  notches  at  10  Hz)* 

It  is  evident  that  the  disturbance  rejection  properties  in  these  cases 
replicate  the  theoretical  results  (Fig.  4).  By  overlooking  the  Bode  plots* 
imperfections  due  to  the  analyzer  limitations,  one  can  see  that  the  friction 
feedback  term  effects  the,  desired  low  frequency  (up  to  ^5  Hz)  gain  droop* 

This  property  is  well  illustrated  in  Fig.  23.  We  excite  the  secondary  motor 
with  a  sinusoidal  signal  at  1,  5,  and  10  Hz,  and  show  the  inertia  wheel  res¬ 
ponses  in  the  two  cases  of  the  two- state  observer.  It  is  clear  that  the  low 
frequency  disturbance  (1  Hz)  is  well  suppressed  when  the  friction  term  is  fed 
back.  It  is  also  evident  that  the  break  frequency  is  somewhat  above  5  Hz,  as 
we  still  observer,  at  t?iis  frequency,  a  better  disturbance  suppression  perfor¬ 
mance  in  the  friction  feedback  case*  At  10  Hz  both  designs  exhibit  the  same 
response  which  indicates  that  we  are  beyond  the  break  frequency. 

Similar  tests  were  performed  on  the  four-state  observer  designs,  but  no 
pictures  comparable  to  Fig.  23  are  available  at  this  time.  It  was  observed, 
however,  that  the  10  Hz  disturbance  is  further  suppressed  by  a  factor  of  ^2 
when  the  recoil  estimate  is  used  in  the  feedback  loop  (Cases  I  through  III). 

7.  CONCLUSIONS 


In  this  papier,  two  discrete-^time  control  algorithms  were  obtained.  They 
were  designed  to  stabilize  a  microcomputer-based  servo  control  system  (emula¬ 
ting  a  helicopter  turret  system)  located  in  the  Stabilization  Research  Labora¬ 
tory  facility  at  ARRADCOM.  The  algorithms  include  LQ-based  digital  control  laws 
for  precision  stabilization  in  the  presence  of  external  torque  disturbances, 
and  reduced-order  observers  for  disturbance  and  system  state  estimation.  Both 
algorithms  were  successfully  implemented  and  tested  at  the  ARRADCOM  facility 
and  it  is  shown  that  the  experimental  results  are  in  close  agreement  with  the 
theoretical  results. 
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APPENDIX 


1*  DISCRETE -TIME  CONTROL  LAW  FORMULATION 

The  process  of  obtaining  an  equivalent  discrete-time  control  law  from  the 
continuous-time  system  is  carried  out  in  two  steps  [5] : 

1*  The  discrete -time  representation  of  the  cost  functional 
(Eg*  3-8)  is 


J(V  -  E  HP  A  +  2HSA  +  uiRd"k 

k=0 


subject  to  the  discretized  system 


where 


2k+i  “  +  L\ 


$  =  $  (AT)  -  e 


AAT 


AT 

r_  =  r(AT)  =  f  <5>{t)bdt 
0 


AT 


Qd  -  Qd(AT)  =  /  **  (t)Q*(t)dt 


AT 


s  =  (At)  =  f  *'(t)Qnt)dt 


(A-l) 


(A-2) 


(A-3) 
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(A-3) 


AT 

Rd  =  (at)  =  f  (r  (t)ent)+R)dt 


and  AT  =  sampling  interval* 

2.  To  facilitate  computations,  the  optimal  control  problem 
(Eq-  A-l  and  A-2)  is  reduced  to  an  equivalent  problem 


J<V 


*  ^Qeq\  +  UkRd\ 

-K  ^  U 


(A-4) 


subject  to 


2k+l  '  *eq£k  +  i"k 


where 


$  =  4> 

eq 


-S,R  , 1S' 
Ad  d  ™d 


(A-6) 


2.  DISCRETE-TIME  REDUCED-ORDER  OBSERVERS 

Unlike  the  control  law  design,  discrete-time  reduced-order  observers  cannot 
be  obtained  by  transforming  the  continuous '‘time  into  an  equivalent  discrete-time 
problem*  It  is  required  therefore,  to  carry  out  the  observer  design  process 
from  the  discrete-time  system  (Eq.  A-2)  directly. 

The  estimator  dynamics  are  constructed  to  estimate  the  n-m  unmeasured 
states,  where  n  is  the  system  order,  and  m  is  the  number  of  available  measure^ 
ments.  We  partition,  therefore,  the  system  (Eq.  A-2)  as 


*r,k+l 

‘4>H 

$12 

-r,k 

Ti 

A* 

X  ,  n 

“P,  k+1^ 

= 

*21 

$22 

1 

+ 

A 

y^  =  [Ci  C2] 

A* 

where  x^  e  R1"*1,  xp>k  e  R(n  m)xl  and  Cj  is  invertible.  One  may  estimate  these 
states  by  employing  the  reduced-order  observer  due  to  Gully  [2],  viz., 
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(A-8) 


The  selection  of  T  and  H  completes  the  observer  design.  The  transformation 
matrix  T  enables  one  to  scale  the  observer  gains ,  and  should  be  chosen  accor¬ 
dingly.  The  selection  of  the  gain  matrix  H  determines  the  observer  dynamics 
(Eq.  A-9) ,  and  is  therefore,  more  subtle.  It  can  be  shown  [3], [6]  that  choosing 
H  is  equivalent  to  solving  an  optimal  control  problem 


The  choice  of  Q  and  R  is  not  as  straightforward  as  in  the  control  law  problem, 
since  strictly  speaking,  the  observer  state  does  not  usually  have  physical 
interpretation-  These  matrices  should  rather  be  chosen  for  desired  observer 
pole  locations  and  gain  magnitude  considerations. 

In  summary,  we  conveniently  rewrite  the  observer  equations  (A-8  through 
A-10)  in  a  concise  form,  viz., 
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(A— 13) 


5k+l  =  A^k  +  +  A3Hk 


^P,kJ 


=  A  z,  +  A  V, 

z-k  y-4t 


where  the  coefficients  Aj ,  A2,  and  A3  can  be  identified  from  Eq.  A-9,  and 
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ABSTRACT 


A  data  acquisition  and  analysis  system  using  a  minicomputer  is  described. 
Effort  of  the  system  development,  and  its  applications  to  signal  processing 
and  analysis  is  currently  a  part  of  a  program  on  physical  security  research. 

The  object  of  the  research  is  directed  toward  development  of  target  identification 
techniques  to  be  implemented  in  digital  signal  processors  (or  intrusion  detection 
sensors).  The  digital  signal  processor  performs  automatic  decision-making  in 
different  environmental  conditions  as  a  distributed  processing  device  for  FIDS 
(Facilities  Intrusion  Detection  System).  The  FIDS  is  a  central  processor  monitor¬ 
ing  security  of  an  area  or  complex  to  be  protected.  Data  from  various  intrusion 
detection  devices  such  as  Radio  Frequency  Motion  Sensors  (RFMS),  Vibration  Sensors, 
Passive  Ultrasonic  Sensors  (PUS),  Ultrasonic  Motion  Sensors  (UMS),  Passive 
Infrared  Motion  Sensors  (PIMS),  etc.  have  been  partially  analyzed.  To  be  discussed 
here  in  the  presentation  will  include  the  test  data  from  RF  motion  sensors  and 
vibration  sensors. 


471 


1.0  Introduction 


The  Counter  Intrusion  Laboratory  of  Mobility  Equipment  Research  and 
Development  Command  (MERADCOM)  is  currently  engaged  in  the  development  of 
data  acquisition  and  analysis  equipment  (DAAE).  The  DAAE  will  consist  of 
the  Data  Acquisition  Systems  (DACS)  and  a  Data  Analysis  System  (DAN).  The 
DACs  will  be  microprocessor-based  recording  devices  with  software-control 
capability.  The  DACs  will  be  used  in  the  field  to  select,  record,  and  store 
intrusion  signals  as  well  as  non-intrusion  false  alarm  stimuli  data.  The  DAN 
is  a  Digital  Equipment  Corporation  PDP11  minicomputer  which  will  soon  be 
upgraded.  It  has  been  used  for  data  acquisition  and  analysis  in  the  laboratory. 
The  DAC  which  is  being  developed  will  consist  of  low-frequency,  medium- 
frequency  and  high-frequency  units.  The  DACs  and  the  DAN  will  be  operationally 
integrated  to  form  an  effective  data  acquisition  and  analysis  system  that  will 
provide  scientists  and  engineers  with  an  efficient  data-processing  and  computing 
tool  for  physical  security  RDT&E  programs. 

2.0  Intrusion  Detection  Sensors 

The  Counter  Intrusion  Laboratory  has  developed  a  number  of  sensors  for  the 
Facilities  Intrusion  Detection  System  (FIDS).  The  sensors  included  but  were 
not  limited  to  Balanced  Magnetic  Switch,  Grid  Wire  Sensor,  Duress  Sensor, 
Contraband  Sensor,  Vibration  Sensor,  Passive  Ultrasonic  Sensor,  Ultrasonic 
Motion  Sensor,  and  Radio-Frequency  Motion  Sensor  (or  Microwave  Motion  Sensor), 
etc.  Among  those  sensors,  only  the  last  four  types  of  sensors  required  signal 
processing.  Furthermore,  the  vibration  sensor  and  the  passive  ultrasonic 
sensor  are  the  passive  devices,  and  their  circuit  designs  were  almost  identical 
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except  the  difference  in  bandpass  filter.  However,  a  rejection  filter  (through 
a  multiplier)  can  be  optionally  applied  to  the  passive  ultrasonic  sensor  to 
remove  the  energy  emitted  from  an  active  device  such  as  ultrasonic  motion  sensor 
if  it  is  present. 

The  RF  motion  sensor  and  the  ultrasonic  motion  sensor  developed  by  the 
MERADCOM  are  active  devices.  They  apply  the  Doppler  Principle  to  detect  the 
motion  of  an  intruder.  However,  those  sensors  were  designed  primarily  for 
indoor  implementation.  The  received  signals  were  usually  complicated  by 
environmental  multi  paths  and  were  generally  very  complex. 

3.0  Analytical  Description  of  Intrusion  Signatures 

However,  numerical  generation  of  the  intrusion  signatures  can  be  performed 
through  computer  simulation  or  synthesis  for  some  types  of  sensors.  Brief 
mathematical  description  was  given  here  for  the  active  intrusion  detection 
devices,  particularly  for  the  RF  motion  sensors. 

Let  T(t)  be  the  signal  transmitted  by  an  antenna  or  a  transducer  and  the 
signal  be  represented  as 

T(t)=A0cos(2Trfot+<D0)  (1) 

where  f0  is  the  carrier  frequency,  A0  the  amplitude,  and  <j>0  the  phase.  Let 
the  information  received  by  an  antenna  or  a  microphone  be  R(t).  And  then 
R(t)  may  be  represented  as 

R(t)  =  £  Ancos [2tt ( f0±f n ) t+<f>n ]  (2) 

n=l 

where  n=l,  2,  - »  representing  the  multipath  arrivals,  An  the  amplitudes 

of  the  signals,  fn  the  Doppler  shifted  frequencies  due  to  a  moving  object, 
and  <j>n  the  multipath  phases  of  the  signal  received. 
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Let  T(t)  and  R(t)  pass  through  a  multiplier  (mixer)  as  they  do  in  the 
RFMS  and  UMS  circuits.  Then,  the  output  from  the  mixer  can  be  represented  as 


On  the  right-hand  side  of  eq.(3),  the  first  summation  consists  of  the  high- 
frequency  information  and  the  second  summation  is  primarily  the  very  low- 
frequency  components  which  contain  the  intrusion  information.  We  are  only 
interested  in  the  low-frequency  information  only.  Therefore,  applying  a 
low-pass  filter,  we  have 


S(t)~-  l  Ancos[±2Trfnt+<j>n-<}>0] 
c  n=l 


(4) 


This  is  the  information  that  is  manageable  by  a  digital  computer. 

Equation  (4)  serves  as  a  fundamental  description  of  the  intrusion  signatures 
for  a  number  of  sensors  developed  by  the  Counter  Intrusion  Laboratory.  It 
may  be  integrated  for  the  very  much  simplified  cases.  However,  computer 
simulations  can  be  performed  to  synthesize  the  signatures  by  using  the  equation. 
And  the  computer  simulations  of  intrusion  signals  are  useful  for  understanding 
the  various  signal  characteristics  and  for  extracting  the  features  of  the 
simulated  data  as  well  as  the  observed  signatures. 

4.0  Data  Analysis  For  Intrusion  Detection  Sensors 

Testing  and  evaluation  of  intrusion  detection  sensors  have  been  conducted 
in  the  past.  Two  computer  systems  were  used  for  the  data  acquisition  and 
analysis:  One  was  the  real-time  data  acquisition  system  which  was  used  in  the 
field  for  digital  recording;  the  other  was  used  mainly  for  data  analysis  in 
the  laboratory.  Figure  1  shows  the  block  diagram  for  data  analysis  system  in 
the  laboratory.  The  central  processor  was  a  DEC  PDP11/05  minicomputer.  The 
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Figure  1.  Block  Diagram  Of  The  Data  Analysis  System  In  The  Laboratory. 


peripherals  included  a  RK05  disk,  three  magnetic  tape  drives,  data  storage  and 
display,  a  hard-copy  unit,  and  above  all  a  LPS11  device.  The  LPSll  is  an 
Analog-To-Digi tal  Converter  by  which  we  analyzed  the  data  from  the  analog 
tapes.  The  scope  is  for  a  quick  view  of  analog  signals. 

Data  which  had  been  analyzed  recently  included  the  signatures  from  the 
passive  ultrasonic  sensors,  ultrasonic  motion  sensors,  RF  motion  sensors  and 
vibration  sensors.  However,  we  would  limit  our  discussion  to  the  results  from 
RF  motion  sensors  and  vibration  sensors  only. 

4.1  RF  Intrusion  Signals 

Testing  of  the  RFMS  was  conducted  in  Bldg.  2093,  Ft.  Belvoir,  VA.  Figure 
2  shows  the  configuration  of  the  test  building.  Two  RFMS  transmitting  antennas 
were  mounted  in  one  end  of  the  building.  In  the  other  end  near  the  office 
area  were  mounted  two  RFMS  receiving  antennas.  The  signals  recorded  were  the 
sum  of  two  receiving  antennas. 

Figure  3  shows  the  background  data  without  stimuli.  The  RFMS  uses  a 
carrier  frequency  at  915  MHz.  The  data  shown  here  has  a  sampling  rate  at 
40Hz  where  the  carrier  frequency  has  been  removed  (i.e.eq.  (4)).  Trace  A  was 
the  data  which  was  taken  when  heating  system  was  in  the  warm-up  phase.  And 
Trace  B  was  recorded  about  10  minutes  after  the  Trace  A.  At  that  time,  the 
door  was  rattling  in  the  wind.  The  last  trace  of  the  time-series  data  was 
recorded  in  the  general  background  in  the  winter  where  the  heating  air  flow 
was  there  in  the  duct.  The  air  flow  caused  vibration  in  the  duct  and  the 
high-frequency  components  of  the  data  was  probably  the  phase  fluctuations  due 
to  the  RF  reflection  on  the  duct. 
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Figure  2.  Configuration  of  Test  Building 

( 40 1  x  50') 
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X=0-25 . 60 OSEC 


Figure  3 


Background  Data  Without  Stimuli 

A.  Heating  System  In  Warm-up  Condition 

B.  Door  Rattling  In  The  Wind 

C.  Heating  Air  Flow  In  The  Duct 
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Figure  4  shows  the  time-and  frequency- domain  data  in  which  one  was  shaking 
the  steel  door  from  outside.  The  purpose  of  the  test  was  to  simulate  the 
windy  conditions.  The  spectral  lines  here  indicated  the  vibration  characteristics 
of  the  steel  door.  Those  lines  would  be  shifted  to  the  lower-frequency  range 
and  disappeared  as  time  went  by  when  one  stopped  the  shaking. 

A  test  was  conducted  by  opening  and  closing  the  entrance  door  for  a  number 
of  times.  Figure  5  shows  the  time-series  data  for  the  test.  The  first  trace 
illustrated  the  situation  where  one  opened  and  closed  the  door  normally  and 
continuously  for  five  times.  Trace  B  of  the  data  showed  the  testing  where 
one  opened  and  slammed  shut  the  same  door  for  five  times.  Because  the  door  was 
open  outside,  the  contribution  to  the  time-domain  features  was  mostly  from  the 
situations  where  the  door  had  its  surface  closely  parallel  to  the  surface  of 
the  wall . 

A  number  of  walking  or  running  tests  were  conducted  where  a  man  walked 
or  ran  in  various  paths.  The  dashed  lines  in  Figure  2  indicated  the  paths 
for  the  walking/running  tests  in  discussion.  Figure  6  shows  the  time-domain 
data  by  the  man  walking  or  running  in  the  Path  6.  The  first  trace  of  the  figure 
was  the  time- domain  data  where  the  man  walked  from  the  entrance  door  to  the 
office  area  following  the  dashed-line  path.  Trace  B  was  the  signatures  where 
the  man  walked  in  the  opposite  direction  of  Trace  A.  If  the  man  walked  exactly 
in  the  same  path  in  the  opposite  direction  with  the  same  speed,  we  might  expect 
that  Trace  B  was  the  time  reversal  of  Trace  A.  In  any  case,  Trace  C  was  the 
data  taken  when  the  man  was  running  in  the  same  path  to-and-fro  twice. 
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Figure  5.  Data  By  Opening  And  Closing  The  Front  Entrance  Door. 

(A)  Normally  And  Continuously  Five  Times 

(B)  Opening  And  Slamming  The  Door  Five  Times 
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Figure  6 


Data  By  Man  Walking  Or  Running  (Path  6  in  Figure  2) 

A.  Walking  From  Entrance  Door  To  The  Office  Area 

B.  Walking  From  The  Office  Area  To  The  Entrance  Door 

C.  Running  From  The  Entrance  Door  To  The  Office  Area 
Then  Back  And  Forth 
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The  following  figure  (Figure  7)  shows  the  situations  where  the  man  was 
walking  from  right  to  left  and  back  twice  in  the  building  following  the  dashed 
line  path  7A,  7B,  and  7C  shown  in  Figure  2.  The  complex  time-domain  features 
of  the  data  suggested  the  extensive  environmental  reverberation,  multi  paths, 
and  the  Doppler  spread  of  frequencies.  Another  factor  which  caused  the 
complexity  of  time-domain  feature  was  the  fact  that  there  were  two  emitters 
and  two  receivers  used  in  the  test. 

The  complete  set  of  the  data  acquired  in  the  RFMS  test  was  compiled 
and  presented  in  the  Appendix.  In  the  appendix,  the  time-domain  data  as  well 
as  its  Fourier  amplitude  spectrum  were  illustrated.  Discussions  of  the  data 
analysis  on  RF  sensors  so  far  are  limited  to  the  signal  processing  with 
emphasis  on  data  acquisition  and  display.  Mathematical  computations  for 
parametrizations  or  for  feature  extractions  have  not  been  attempted  yet. 

However,  the  time-domain  features  seem  meaningful.  To  a  certain  extent,  they 
can  be  understood  and  make  sense  visually. 

4.2  Impact  Response  Analysis  For  The  Naval  Steel  Lattice  Vibration  Sensor  System 

In  1978,  the  BDM  Corporation  under  a  contract  with  MERADCOM  conducted  a 
data  acquisition  effort  for  the  steel  lattice  vibration  sensor  system  at  the 
Naval  Weapons  Station  in  Yorktown,  Virginia.  One  of  the  objectives  for  the 
test  was  to  collect  the  vibration  sensor  system  response  and  the  performance 
information  for  analysis.  Two  of  the  naval  ammunition  magazines  were  used  for 
the  test.  The  magazines  were  earth-covered  arch- type  structures  of  traditional 
design  and  were  made  entirely  of  reinforced  concrete. 

Figure  8  shows  the  diagram  of  the  steel  lattice  network  and  the  sensor 
configuration.  The  lattice  network  was  typical  there  in  the  naval  installation. 
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Figure  7.  Data  By  Man  Walking  (Paths  7A»  7B,  7C  In  Figure  2) 


The  bunker  was  covered  with  earth  at  the  ground  level.  Five  detectors  of  the 
J-SIIDS  (Joint- Services  Interior  Intrusion  Detection  System)  vibration  sensors 
were  mounted  on  the  central  longitudinal  bar  as  the  signal  receivers  which  were 
shown  in  the  figure.  In  the  upper  corner  on  the  left  was  the  configuration  of 
detectors  which  showed  that  the  outputs  from  the  detectors  were  connected  into 
two  separate  processors.  Analog  data  was  taken  at  the  input  to  the  processor 
and  at  the  bandpass  filter  output  in  the  processor.  Therefore,  four-channels 
of  data  were  recorded  in  the  analog  tape  in  which  two  (ch.  7  and  6)  were  wide¬ 
band  input  and  the  other  two  (ch.  4  and  5)  were  noted  as  processed  channels. 

Figure  9  shows  the  block  diagram  of  signal  processing  for  the  test  data  from 
the  vibration  sensors.  The  analog  data  was  digitized  and  stored  on  the  magnetic 
tapes.  In  order  to  acquire  the  digital  data  with  the  required  40KHz  sampling 
rate  for  analysis,  the  Ampex  recorder  was  played  back  with  a  factor  of  k  of  the 
recording  speed.  With  the  sampling  rate,  a  substantial  amount  of  data  was  written 
on  the  tapes.  To  find  the  signals,  we  screened  the  raw  digital  tapes  with  an 
envelope  detector  and  printed  the  locations  of  the  signals.  On  the  basis  of 
printer  output,  we  displayed  the  signal  data  and  the  noise  data  as  well  on  the 
CRT.  The  desired  data  was  written  on  the  output  tape  for  later  analysis. 

Figure  10  shows  a  display  of  the  data  from  a  thumper  signal  and  its  Fourier 
amplitude  spectrum.  The  signal  was  generated  by  applying  a  thumper  on  the 
concrete  surface  inside  the  bunker.  The  upper  trace  is  the  time-domain  data 
and  the  lower  part  is  its  frequency-domain  spectrum. 

The  object  of  the  present  analysis  on  the  vibration  sensor  data  from  the 
Yorktown  test  was  to  find  the  optimum  detection  bandwidth  for  the  impact 
response  data  from  the  bunkers.  The  other  purpose  was  to  verify  the  J-SIIDS 
vibration  sensor's  bandpass  characteristic.  In  the  lower  part  of  Figure  9  on 
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Figure  9.  Block  Diagram  of  Signal  Processing  For  The  Test  Data  From  The 
Vibration  Sensors 


the  left,  the  diagram  shows  that  the  input  data  X.  passes  through  a  convolution 
filter  H.j  and  yields  the  output  Yj.  Then,  the  purpose  of  the  present  work  was 
to  find  H-  so  as  to  maximize  the  s  ignal- to-noise  ratio  for  the  output  Yn .  In 
order  to  accomplish  the  purpose,  we  stacked  the  signal  spectra  and  the  noise 
spectra  separately  from  the  data  ensemble  which  have  been  edited  on  the  tape. 
Finally,  we  computed  the  filter  H.  in  the  frequency  domain. 

The  vibration  sensor  test  resulted  in  a  large  volume  of  analog  data  where  a 
number  of  stimuli  was  used  in  the  test.  However,  presented  here  are  limited 
to  the  results  of  two  stimuli,  namely  thumper  and  rotohammer.  Furthermore,  all 
four  channels  of  the  recorded  data  were  processed  for  impact  response  analysis 
and  for  verification  of  bandpass  filter  characteristics.  Partial  accomplishment 
of  the  analysis  had  been  presented  in  a  technical  report  in  September  1980  and 
the  remaining  part  will  soon  be  presented  in  a  separate  report.  In  the  current 
presentation,  only  results  from  Channels  7  and  4  (i.  e.,  processor  #?,  input  and 
output)  will  be  discussed. 

4.2.1  Thumper: 

Figure  11  shows  the  average  power  spectrum  from  the  edited  noise  records. 

The  noise  spectrum  indicated  the  A.C.  power-source  contamination  in  the  data. 
Figure  12  shows  the  average  signal  amplitude  spectrum  from  the  thumper  on  the 
concrete  surface.  It  showed  two  significant  passbands  of  energy:  one  was  in 
the  range  below  6  KHz  and  the  other  was  in  the  10-15  KHz  bandwidth.  The  low- 
frequency  band  was  in  the  normal  audible  range  and  the  higher-frequency  band 
was  used  for  detection  purpose  so  that  false  alarms  can  be  reduced  or  eliminated. 
For  the  optimum  computation.  Figure  13  shows  the  frequency  domain  response  of 
the  filter  which  maximized  the  signal -to-noise  ratio  for  the  detection  circuit. 
There  was  a  9-point  smoothing  similar  to  the  Hamming  Window  applied  to  the 
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Fiaure  11.  Averaae  Noise  Fourier  ''nnlitude  Soectrum  (Thumer) 
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nntimum  Hetection  Response  For  Processor  2  Innut  (Thunmer) 


response  curve.  The  6-dB  response  of  the  optimum  detection  bandwidth  in  the 
higher  frequency  range  was  about  from  10  KHz  to  15  KHz.  Figure  14  shows  the 
same  analysis  for  the  Channel-4  data. 

4.2.2.  Roto  Hammer 

The  same  analysis  was  done  and  presented  for  the  signals  using  the  roto 

hammer  as  stimulus.  Figure  15  presents  the  average  frequency-domain  amplitude 

spectrum  for  the  noise  records  taken  during  the  roto  hammer  test.  The  noise 

samples  might  include  the  data  while  the  rotohammer  was  running  in  the  air. 

* 

The  rotohammer  ran  continuously  for  a  period  of  time  at  each  test  point  during 
the  test.  Figure  16  shows  the  average  Fourier  amplitude  spectrum  for  the 
rotohammer  signals.  There  are  two  apparent  important  bands  of  energy  in  the 
spectrum:  one  at  the  low  frequency  range  below  4  KHz  and  the  other  at  the 
8-12  KHz  passband.  (Note:  For  this  part  of  the  data,  the  digitization  was  done 
after  the  Ampex  recorder  was  being  used  in  the  field  for  some  time.  As  compared 
with  Figure  13,  we  thought  that  the  speed  of  the  recorder  might  not  be  very 
accurate  and  it  might  be  slowed  down  somewhat  while  the  data  was  beina  digitized). 
For  the  results  of  optimum  detection  analysis.  Figure  17  shows  the  response 
curve  in  the  frequency  domain  for  the  processor  2  input  (channel  7,  wideband). 
There  was  no  smoothing  applied  to  the  response  curve.  Figure  18  shows  the  response 
curve  where  smoothing  has  been  done.  Figure  19  shows  the  computation  for  the 
processed  channel  (ch.  4).  Compared  with  Figure  14,  the  response  band  was 
shifted  to  the  lower  frequency  range,  possibly  due  to  the  cause  of  the  Ampex 
recorder  as  noted  earlier. 

5.0  Summary  and  Discussion 
5.1  Summary 

The  current  data  acquisition  and  analysis  system  for  the  physical  security 
RDThF  has  been  briefly  presented.  The  intrusion  signatures  from  RF  motion  sensors 
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and  vibration  sensors  have  been  discussed.  The  tine-domain  RF  sensor  data 
appeared  to  be  comprehensible.  In  addition  to  envelope  detection  of  intrusions, 
detections  by  applying  the  time-domain  features  seem  feasible  and  can  be 
attempted.  Parametrizations  using  the  frenuencv-domain  amplitudes  may  also  be 
an  important  area  for  feature  extractions. 

The  impact  response  data  from  concrete  surface  suggested  two  important 
frequency  bands  of  energy  concentration:  one  was  in  the  low  frequency  range 
below  6  KHz  and  the  other  in  the  10  KHz  -  15  KHz  range.  The  former  passband 
is  in  the  normal  audible  ranae  and  is  not  suitable  for  application  in  detecting 
the  forced  intrusion.  However,  the  latter  passband  is  useful  for  detecting  the 
forced  intrusion  in  the  structure  of  the  earth-covered  concrete  where  the 
false  alarms  can  be  reduced  to  a  minimum. 

5.2  Discussion 

The  present  DEC  POP  11/05  minicomputer  is  about  adequate  for  data  acquisition 
and  display  with  some  computing  capability.  The  difficulty  in  performing 
extensive  computing  using  the  current  system  lies  in  the  fact  that  its  memory 
capacity  is  only  8K  words.  However,  the  system  will  soon  be  upgraded  a 
medium  range  PDP  11  system  with  additional  peripherals.  We  expect  that  the 
computing  and  data- handling  capabilities  of  the  data  analysis  system  will  be 
enhanced  significantly. 

The  time-domain  features  from  the  RF  motion  sensor  is  comprehensible  for 
some  cases.  Time-domain  feature  study  can  be  done  by  simplifying  the  transmitter 
and  receiver  components  for  better  understanding  and  gradually  by  increasing 
the  elements  of  transmitters  and/or  receivers  to  the  configuration  for  the 
practical  applications  such  as  those  shown  in  the  test. 
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Data  from  vibration  sensor  test  in  Yorktown,  Virginia  was  very  voluminous. 
Various  types  of  data  in  different  conditions  were  taken  and  the  data  qualities 
were  also  varied  from  one  case  to  the  other.  The  processed  channel  which  was 
designed  for  detection  in  the  vibration  sensor  was  in  agreement  with  the  impact 
response  data  from  the  earth-covered  concrete. 


APPENDIX 

This  appendix  compiled  the  data  for  the  RF  motion  sensor  test  at  Building  2093. 
A  total  of  21  tests  was  conducted.  Presented  here  are  the  displays  of  the  time- 
domain  amplitudes  (1024  points)  and  the  frequency-domain  spectra  and  the  frequency- 
domain  spectra.  The  right-most  number  in  the  second  line  was  the  maximum  count  of 
the  time-domain  amplitudes.  For  example,  in  Figure  A-l,  the  maximum  amplitude  was 
249,  which  was  relatively  lower  as  compared  with  those  of  the  walking  tests,  say 
1027  in  Figure  A- 10. 
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Figure  A-2.  background  -  Mo  Stimuli  fLarge  Moor  Rattling  In  t*ie  Wind 


Figure  A-3-  Background  -  Heating  Air  Flow  On  Duct 
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Figure  A-'!,  S\’!'’ng  L.^rge  Steel  n0or  Frrra  Outside 


Figure  A-5.  background 


Figure  A-6 .  Opening  and  Closing  The  Entrance  floor  Normally  For  cive 
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Figure  A-10.  Man  Walking  (See  Figure  2,  Path  7A) 
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Figure  A- 1 2 *  Man  Walking  (See  Figure  2,  Path  7C1 


Figure  A-13.  Man  Walking  (From  °ig^t  Tn  Loft,  To  >he  Central  Line  And  Sackl 
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Figure  A-14.  Man  Walking  (From  Right  To  Left,  45°  To  T'-.e  Central  Line  And  Tack) 
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Figure  A-1F.  van  talking  ( Frr-n  Right  To  Left,  450  To  The  Central  Line  And  ^ack) 
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Man  Punning  (From  Entrance  floor  To  The  nTfice  Area  And  ^ack,  See  figure  ?,  Path  5) 


Figure  A— 17.  Dolly  With  Four  Wheels,  Boxes  Pi  led  To  Five  Feet  Kiq'.  Pulled  Across  Floor  Sv  Strinn 
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Figure  A-19.  Steel  '’all  71s  Ai*'  '’oiled  Slowly  (Opposite  Tn  That  °f  Figure  A-1Q1 
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NUMERICAL  SOLUTION  TO  BEAM  VIBRATIONS  UNDER  A  MOVING  COUPLE 


Julian  J,  Wu 

U.S.  Array  Armament  Research  and  Development  Command 
Large  Caliber  Weapon  Systems  Laboratory 
Benet  Weapons  Laboratory 
Watervliet,  NY  12189 


ABSTRACT.  The  finite  element  solution  formulation  in  time-  and 
space-coordinates  is  extended  to  beam  vibrations  effected  by  a  moving  couple. 
This  problem  has  direct  application  to  gun  motions  analysis  with  an  unbalanced 
moving  projectile.  The  moving  load,  instead  of  being  a  time-dependent  Dirac 
delta  function  as  for  the  case  of  a  moving  concentered  force,  is  now  the 
derivative  of  this  Dirac  delta  function*  This  singular  function  does  not 
present  any  difficulty  due  to  the  variational  process  employed.  This  solution 
procedure  is  described  together  with  results  of  beam  motions  subjected  to  a 
couple  moving  with  various  speeds. 

1.  INTRODUCTION.  In  a  previous  report  [1],  this  writer  presented  a 
finite  element-variational  formulation  which  discretizes  the  spatial  and  time 
variable  in  the  same  manner.  The  method  was  applied  to  a  problem  of  beam 
motion  subjected  to  moving  concentrated  forces.  Results  were  shown  to  be  in 
excellent  agreement  with  known  solutions*  this  same  formulation  is  now  applied 
to  the  problem  of  a  couple,  i.e.,  a  concentrated  bending  moment. 


A  recent  investigation  by  S*  H.  Chu  [2]  on  the  interacting  forces  between 
a  projectile  and  the  cannon  tube  indicates  that  the  couple  produced  by  the 
eccentricity  of  the  projectile  as  It  moves  down  the  tube  may  be  of  such  a 
magnitude  that  its  effect  on  the  tube  motion  becomes  significant.  It  is  then 
important  that  the  problem  associated  with  moving  moments  can  be  analyzed 
adequately.  The  purpose  of  this  note  is  to  present  the  modification  necessary 
to  the  previous  formulations  so  that  the  solutions  of  a  beam  motion  problem 
under  a  moving  bending  moment  can  be  obtained  routinely.  Results  of  a 
cantilevered  beam  subject  to  such  a  load  are  also  presented* 

2.  DIFFERENTIAL  EQUATION  AND  MONDIMENS IONALI 2 AT10N.  Consider  a 
Euler-Bernoulli  beam  subjected  to  a  moving  couple  M.  The  equation 
differential  can  be  written  as 

Ely"1*  +  pAy  =  -Md'(x-x)  (1) 


where  y(x,t)  denotes  the  beam  deflection  as  a  function  of  spatial  coordinate  x 
and  time  t.  E,  I,  A,  p  denote  elastic  modulus,  second  moment  of  inertia  area 
and  material  density  respectively.  A  dirac  function  is  denoted  by  6,  x  -  x(t) 
is  the  location  of  M,  a  prime  (f)  denotes  differentiation  with  respect  to  x 
and  a  dot  (*),  differentiation  with  respect  to  t.  Note  that  the  right  hand 
side  of  Eq.  (1)  has  a  dimension  of  force  due  to  the  fact  that  6'(x-3?)  ■  d/dx 
6(x-x)  has  a  dimension  of  (length)-1. 
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Introducing  nondlraenslonal  quantities 


(2) 


y  -  y/A  ,  x  -  x/£  ,  t  -  t/T  , 

where  A  Is  the  length  of  the  beam  and  T  Is  a  finite  time,  within  0  <  t  <  T, 
the  problem  Is  of  Interest,  Eq.  (1)  can  be  written  as 

y""  +  y2y  -  -  QS’(x-x)  (3) 

The  hats  (*)  have  been  omitted  in  Eq.  (3)  and 

c 

y  ■  - 


T 

m 

Q -  (4) 

El 


with 

pAA4 

c2  - - - 

El 


Boundary  conditions  associated  with  Eqs.  (1)  or  (2)  will  now  be  introduced  in 
conjunction  of  a  variational  problem.  Consider 

51-0  (5a) 


with 

11  *•  - 
I  m  Jg[y"y*"  -  y2yy*  +  QS(x~x)y*]dxdt 

+  Jq  dt{k1y(0,t)y*(0,t)  +  k^y' (0,t)y*' (0,t) 

+  k^yG,  t)y*Cl,  t)  +  k^y*  (l.t)y*’ (l,t)} 

+  Y2Jq  dx{k^[y(x,0)  -  Y(x)]y*(x,l)}  (5b) 


where  y*(x,t)  is  the  adjoint  variable  of  y(x,t).  If  one  takes  the  first 
variation  of  I  considering  y(x,t)  to  be  fixed: 


(«I)6y-0  "  0 


(5a') 


and  consider  6y*  to  be  completely  arbitrary,  it  is  eaBy  to  see  that  Eq6.  (5) 
are  equivalent  to  the  differential  Eq.  (3)  and  the  following  boundary  and 
initial  conditions. 

y"'(Q,t)  +  kiy(0,t)  -  0 

y"(0, t)  -  k2y’(0,t)  -  0 

0  <  t  <  1  (6a) 

y” 1 ( 1  ,t)  -  k3y(l,t)  «  0 

y"(  1  ,t)  +  kz,y'  (1  ,t)  •=  0 
# 

y(x.O)  =  0 

and  .  0  <  x  <  1  (6b) 

y(x,l)  -  k5[y(x,0)  -  Y(x)]  -  0 

Taking  appropriate  values  for  kj,  k2,  k3,  and  k^,  problems  with  a  wide  range 
of  boundary  conditions  can  be  realized.  The  initial  conditions  in  EqB.  (6b) 
are  that  the  beam  has  zero  initial  velocity,  and,  if  one  takes  k5  to  be  (or 
larger  number  compared  with  unity), 

y(x,0)  »  Y(x) 

The  meaning  for  cases  where  k3  is  not  so  need  not  be  our  concern  here. 

To  derive  the  finite  element  matrix  equations,  one  begins  with  Eq.  (5a') 
and  write 

(SDfiy-0  >  0  (7a) 

11  *  *  —  — 

a  J  J  [y"6y*"  -  y2y6y*  +  Q6 ’ (x-x) 6y*]dxdt 

+  /  dt[kiy(0,t)6y*(0,t)  +  k2y’ (0,t)6y*'(0,t) 

+  k3y(l ,t)6y*(l ,t)  +  k4y' (1 ,t)6y*' (1 ,t)] 

+  Y2Iq  dx{k^ [ y(x,0)  -  Y(x) ] 6y*(x, 1) }  (7b) 
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Introducing  element  local  variables 


(i) 

C  -  £  =  Kx-i+1 

(i)  (8a) 

n  =  n  -  Lt-j+l 

or 

1 

x  =  -  (5+i-l) 

K 

(8b) 

1 

t  -  -  (5+j-i) 

Li 

where  K  is  the  number  of  divisions  in  x  and  L,  in  t,  (A  typical  grid  scheme 
is  shown  in  Figure  1).  Equation  (7b)  can  now  be  written  as 

K  L  i  i  ^3  y^L 

ill  j^i  fo  Vl“  y"(1j)6y*"(ij)  ■  y(ij)6y*(ij)]d^n 

L  i  k  ^  1^2 

+  1  I  dn  1“  y(ij)(°,n)6y*(ij)(0,n)  +  k2  —  y'  (i j)(0,n)Sy*'  (i j)(0,n) 


+  l  L  ”  [y2^5(y(i j) (£ »0) Sy*(i j) ( S , i)) ] 

•j  =1  UK 


K  L  q  i  i  _ 

=  ~  £  /q  5,(x-x)6y*(ij)(5,n)d£dii 

K  y2k.5  i 

+  l  -  J  d£  [Y(1)(5)6y*(iL)C5,l)]  (9) 

i=l  K  0 

The  shape  function  vector  is  now  introduced.  Let 

y(ij)(5,n)  ■  aTce  »n)Y( i j) 


y*(ij)(e,n)  aT(5,n)Y*(1:j)  =*  Y*T(ij)a(c,n)  (10) 
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Eqtiation  (9)  then  becomes 


K  L 

1  1 

K3 

6Y*T(ij)  lr  £  - 

f± 

K 

?l  TC(1J) 

i-1  j= 

i  “  lj 

L 

ki 

k2K2 

B2> 

r*  *• 

+  I 

i-1 

S**T(ij)  I"-  ?1  + 

L 

h 

k3 

k4K2 

04}  Y(ij) 

+  I 

i=l 

«Y*T(Kj)  1“  03  + 

L 

K  T2M 

+ 

I  «I*X(iL) 

-  B5  J 

I(iL) 

-  I  I  «*Tcii)  7  rCij)  +  T 

1-1  j-1  “  L  1=1 

where,  as  it  can  be  seen  easily,  that 


-  =  'o  ^0  d5dT1 


.1  4 


!  -  '0  Jo  dedn 

1  1  1 
Bl  =  /  a(0,n)aT(0,n)dn  ,  B2  -  IQ  a,5(0,n)a 


b3 


=  f1  a(l,n)aT(l,n)dn  ,  B4  -  a^Cl.n)^ 


Be  =  J  aU,l)aTU,0)d5 
-  0  - 


and 


,  ,  -  -  A 

P(lj)  -  /  /  a(5,n)6’(ij)(5-OdWn  ,  0(1)  “  JQ 


0(1)  Cu) 


^(0,n)dr> 

(12) 

’  £(l,n)dn 


aU,i>*(i)(S>d* 
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where 


is  the  local  version  of  the  function  6(x-x)  appeared  in  Eq.  (9).  The  specific 
form  of  6(y)(£-£)  will  be  described  later  in  a  paragraph  prior  to  Eqs.  (18). 

Now  Eq.  (11)  can  be  assembled  in  a  global  matrix  equation 

6Y*t  K  Y  -  <SY*  F  (13) 

By  virtue  of  the  fact  that  6Y*  is  not  subjected  to  any  constrained  conditions, 
one  has 

K  Y  =■  F  (14) 

which  can  be  solved  routinely.  Numerical  results  of  several  problems  in  this 
class  will  be  presented  in  a  later  section, 

3.  FORCE  VECTOR  DUE  TO  A  MOVING  COUPLE.  We  shall  describe  here  the 
procedures  involved  to  arrive  at  the  force  vector  contributed  by  a  moving 
couple.  This  force  vector  has  appeared  in  Eq.  (12)  as 

,1,1 

P(ij)  -  J  J  a(g,n)6,(1j)(e-e)d5dh  (15a) 

Perform  integration-by-parts  once.  Equation  (15a)  can  be  written  as 

~F(ij)  -  "  Jq  /q  e,e«,n).6(1J)(«-i)d5dn  (15b) 

The  shape  function  a(5,n)  is  a  vector  of  16  in  dimension.  In  the  present  for¬ 
mulation  we  have  chosen  the  form: 

afcXS.h)  "  bi(Obj(n)  ,  k  =  1,2, 3,.. .16  (16a) 

i, j  *  1 ,2,3,4 

and 

akf£($,n)  =  bi'U)bj(0  (16b) 
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The  relations  between  k  and  i,j  are  given  in  Table  I.  These  are  the  conse¬ 
quences  of  the  choice  of  the  shape  function  such  that  Y(ij)>  the  generalized 
coordinates  of  the  (ij)th  element,  represent  the  displacement,  slope,  veloc¬ 
ity,  and  angular  velocity  at  the  local  nodal  points.  Thus 

-  £  -  p-1  -  tt  p-1 

=  l  bips  ;  bi'co  =  l  b*ipe  (17) 

p=i  p=i 

The  values  of  b^p  are  given  in  Tables  II  and  III. 


TABLE  I.  RELATIONSHIP  BETWEEN  (i,j)  AND  k  IN  EQUATION  (16) 


k 

1 

1 

1 

(i,j)  1 

1 

k 

1 

1 

1 

(i,  j) 

1 

1 

1 

1 

0,1)  1 
j 

9 

1 

1 

| 

(1,3) 

2 

1 

1 

(2,1)  I 

10 
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1 

1 

(2,3) 

3 

l 

1 
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(1,2)  I 

11 

1 

1 

| 

(1,4) 

4 

1 

1 

| 

(2,2)  | 
j 

12 

1 

1 

1 

(2,4) 

5 

1 

1 

(3,1)  I 

| 

13 

1 

1 

1 

(3,3) 

6 

1 

1 

| 

(4,1)  1 

| 

14 

1 

1 

1 

(4,3) 

7 

1 

1 

1 

(3,2)  | 

15 

1 

1 

| 

(3,4) 

8 

1 

1 

1 

(4,2)  1 

1 

16 

1 

1 

1 

(4,4) 
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TABLE  III.  VALUES  OF  b'ip  IN  EQUATION  (17) 


Now,  let  us  consider  5 (ii) (£“?) •  This  "function"  represents  the  effect 
of  the  Dirac  delta  function  6(x-x)  on  the  (ij)th  element.  If  the  curve  of 
travel  x  =  x(t)  does  not  go  through  the  element  (i,j),  ^(ij)(5'£)  -  0.  If  it 
passes  through  that  element,  one  has 


6(ij)(5~0  =  <S(x-x)  =  K6(5~5) 


(18a) 


with 


5  -  5(h) 


(18b) 


The  function  5(h)  is  derived  from  x  =>  x(t).  For  example,  if  the  force  moves 
with  a  constant  velocity,  one  has 


x  =  x(t)  »  vt 


(19a) 


it  follows  from  Eqs.  (8)  that 


-  -  vK 

5  -  5(h)  =  -i+1  +  —  (h+J-1) 

L 

With  Eqs.  (16),  (17),  (18),  and  (19),  one  writes  (15)  as 


(19b) 


4  ,1 


F(ij)k  “  Kj  J0  ak,5(5,h)6(5-5)d5dh 


(20a) 


.1,1-  -  p-1  q-1- 

F(ij)k  “  L  b'»hia  5  h  SU-5)d5dn 


0  J  0 


(20b) 


Equation  (20)  can  then  be  evaluated  easily  once  the  exact  form  of  5  is 
written.  For  example,  if  5  =  h,  Eq.  (20)  reduces  to 


r  ^  -  ,1  p+q-2 

F(ij)k  =  l  I  kb'ip  blq  f  5  d5 

p=l  q=“l  u 


y  y 

p-l  q-l  -p^9”l 


(21) 
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TABLE  IV.  DEFLECTION  y(x,t)/A  OF  A  CANTILEVERED  BEAM  UNDER  A 
MOVING  CONCENTRATED  MOMENT  (T  =  10 10  sec.) 


r 

i 

i 

i 

x/A 

t/T 

1  1 

1  0  1 

1  1 

~r 

i 

0.25  | 

1 

0.50  | 

- f 

1 

0.75  1 

I 

1.00 

I 

1 

1 

1 

r 

i 

0. 

_r_  f 

1  o.  1 

1  1 

r 

0.  1 

o.  1 

0.  1 

1 

0. 

1 

1 

1 

1 

1 

i 

0.25 

!  1 

1  o.  I 

t  i 

1 

.03125  I 

1 

.09375  | 

\ 

0.15625  I 

I 

0.21875 

I 

1 

1 

1 

I 

i 

0.50 

1  1 

I  o.  i 

1 

.03125  I 

1 

.12500  1 

1 

0.25000  ! 

1 

0.37500 

1 

1 

1 

1 

1 

i 

0.75 

i  i 

1  o.  1 

1  1 

1 

.03125  I 

1 

.12500  I 

1 

0.28125  | 

1 

0.46875 

1 

1 

1 

1 

i 

i 

1.00 

1  I 

1  o.  i 

1  1 

1 

.03125  I 

1 
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TABLE  V.  DEFLECTION  y'(x,t)/A  OF  A  CANTILEVERED  BEAM  UNDER  A 
MOVING  CONCENTRATED  MOMENT  (T  -  1010  sec.) 
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TABLE  VI.  DEFLECTION  y(x,t)/Jt  OF  A  CANTILEVERED  BEAM  UNDER  A  MOVING  CONCENTRATED  MOMENT 
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TABLE  VII.  DEFLECTION  y(x,t )/£  OF  A  CANTILEVERED  BEAM  UNDER  A  MOVING  CONCENTRATED  MOMENT 
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TABLE  VIII.  DEFLECTION  y(x,t)/£  OF  A  CANTILEVERED  BEAM  UNDER  A  MOVING  CONCENTRATED  MOMENT 
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4.  NUMERICAL  DEMONSTRATIONS .  Some  numerical  results  obtained  will  now 
be  presented.  Let  us  consider  a  cantilevered  beam  subjected  to  a  unit  moving 
couple  with  a  constant  velocity 


l 

v  =  - 
T 

As  T  varies  from  00  to  0,  the  velocity  varies  from  0  to  «*• 

It  will  be  helpful  to  compare  v  with  some  reference  velocity  which  is  a 
characteristic  of  the  given  beam.  It  is  known  that  for  a  cantilevered  beam, 
the  first  mode  of  vibration  has  a  frequency  (see,  for  example,  [3]) 

0)  1  1.8752.  °-560 

fi  "  —  =  —  (. - J  - -  (cycles  per  seconds) 

2tt  2tt  c  c 

and  the  period. 


where 


Tj  »  1.786  c 


El 


Consider  the  vibration  as  standing  waves.  They  travel  at  a  speed 


1.12A 

V!  =  2Af i  =  - 

c 


Hence,  the  relative  velocity 

v  T1  c 

v  *  -----  -  0.893  - 
Vi  2T  T 


We  shall  take  c  -  1.0  for  the  moving  force  problems.  Thus,  =  0.560  Hz. 
Tj  -  1.786  sec.  and 

v  -  0.893/T 
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Using  grid  schemes  of  4x4  (i.e. ,  four  segments  in  spatial  and  four  in 
time  coordinates)  and  8x4,  Tables  IV  through  VIII  show  the  beam  deflections 
(and  slopes)  as  the  concentrated  moment  Q  *  1*0  moves  from  the  left  to  the 
right  end.  Since  we  have  defined  T  as  the  time  required  for  the  load  to 
travel  from  one  end  to  another,  t  »  0.5T,  for  example,  indicates  the  point  in 
time  when  the  load  is  at  the  midspan  of  the  beam. 

In  Tables  IV  and  V,  T  is  set  to  1010  sec.  which  is  extremely  large 
compared  with  the  beam  characteristic  time  of  =  1*786  sec.  The  solution 
should  reduce  to  the  static  problem.  This  is  certainly  the  case  as  shown  in 
these  two  tables.  These  results  are  obtained  using  a  grid  scheme  of  4x4. 

For  results  shown  in  Tables  VI  through  VIII  an  8x4  grid  scheme  has  been 
used.  The  beam  deflections  for  T  »  10,  1*0,  and  0.1  seconds  are  shown  in 
Table  VI,  VII,  and  VIII  respectively* 

Finally,  these  deflection  curves  are  also  plotted  in  Figures  2  through 
10.  From  these  figures  and  the  tabulated  results,  one  observes  that  while 
some  of  the  results  are  extremely  good,  others  are  changing  so  rapidly  with 
respect  to  time  or  space  variable  that  an  assessment  on  their  accuracy  is  very 
difficult.  Hence,  further  investigations  on  numerical  convergence  of  these 
data  is  necessary. 
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Figure  2.  Def lection(s)  of  a  Cantilevered  Beam 
Under  a  Moving  Couple. 
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Figure  4.  Deflection (s)  of  a  Ca 
Moving  Couple. 
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Figure  5.  Def lection(s)  of  a  Cantilevered  Beam  Under  a 
Moving  Couple. 
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Figure  6.  Def lection(s)  of  a  Cantilevered  Beam  Under  a 
Moving  Couple. 
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ANALYTICAL  SOLUTIONS  IN  NUMERICAL  ANALYSIS 


J  L  Harris 

US  Army  Missile  Command 
US  Army  Missile  Laboratory 
Redstone  Arsenal,  Alabama  35898 


ABSTRACT.  This  paper  will  deal  with  the  useage  of  exact  solutions  to 
differential  equations  as  a  means  to  decrease  the  computational  burden  associated 
with  guided  missile  flight  simulation.  In  this  type  of  simulation,  the  governing 
differential  equations  are  typically  solved  for  the  highest  order  derivitive, 
then  numerically  solved  by  forward  integration  over  time,  using  some  appropriate 
integration  method  and  a  time  increment  which  must  be  chosen  sufficiently 
small  for  representation  of  the  various  system  responses.  Many  of  the  system 
responses  can  be  represented  by  linear  differential  equations,  the  solutions  to 
which  are  well  known.  By  computing  the  exact  solution  to  the  equation,  the 
time  increment  (  dt)  can  be  chosen  as  large  as  the  dynamics  of  the  inputs 
to  the  response  function  will  allow,  rather  than  being  governed  by  the 
dynamics  of  the  response  function  itself. 

In  general  this  will  result  in  a  computer  time  reduction,  although  there 
may  be  instances  where  this  is  not  true.  An  example  will  be  presented. 

1.  INTRODUCTION.  The  stimulation  to  write  this  paper  has  grown  out  of 
several  years  of  guided  missile  flight  simulation  on  the  part  of  the  author. 

Most  of  this  simulation  was  done  digitally.  The  typical  problem  involves 
the  generation  of  a  missile  trajectory  as  it  flies  toward  its  target,  which 
in  general  is  also  moving.  Fig.  1  illustrates  this.  This  simulation  has 
been  undertaken  to  determine  miss  distance  relative  to  the  target  (aim  point}, 
probability  of  target  kill,  maximum  intercept  range,  altitude,  and  other 
performance  1 imiting  conditions ,  such  as  missile  control  system  requirements. 

2.  SIMULATION  APPROACH.  The  general  approach  to  digital  simulation  of 
guided  missile  motion  is  to  arrange  the  various  differential  equations  which 
govern  the  instantaneous  motion  into  expressions  which  define  the  highest 
derivative  for  each  variable  which  must  be  represented.  These  equations  may 
be  partial  differential  equations,  and  they  may  be  non  linear  with  non 
constant  coefficients.  They  are  then  numerically  integrated  by  one  of  the  many 
numerical  integration  schemes  available.  The  intent  of  this  paper  is  not  to 
treat  numerical  integration  schemes,  rather  it  is  to  suggest  that  occassional ly 
there  are  differential  equations  embedded  in  the  representation  which  can  be 
solved  analytically  and  to  propose  that  these  be  solved  analytically  instead 

of  numerically.  Table  1  presents  the  six  equations  for  missile  motion.  These 
are  always  present.  Control  equations,  etc.,  vary  from  missile  to  missile. 

3.  A  SIMPLIFIED  EXAMPLE.  Figure  2  shows  a  basic  block  diagram  for  a 
simulation  representation.  The  situation  is  driven  by  the  difference  in 
target  and  missile  position,  expressed  in  inertial  space,  taking  round  earth 
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considerations  into  account  if  range/altitude  are  sufficiently  large.  The 
difference  in  target  and  missile  positions  are  used  to  calculate  the  orientation 
in  inertial  space  of  a  line  connecting  the  missile  and  target.  Missile 
orientation  is  referenced  to  inertial  space,  and  the  seeker  may  be  referenced  to 
missile  orientation  or  to  inertial  space,  in  either  case  the  seeker  pointing 
error  is  the  difference  between  the  seeker  centerline  and  the  line  of  sight 
angles.  The  seeker  error  is  used  to  drive  the  seeker  in  a  direction  to  reduce 
the  error,  and  to  command  the  missile  to  maneuver  as  needed  to  produce  target 
intercept.  This  command  produces  a  control  deflection  of  some  form  (fin 
deflection,  thrust  deflection,  etc.).  The  airframe  responds  to  this,  and 
accelerations  are  produced,  which  are  integrated  to  update  missile  velocity 
and  position,  thus  closing  the  loop. 

4.  A  DETAILED  EXAMPLE.  The  seeker  functional  representation  from  Figure  2 
is  shown  with  an  additional  level  of  detail  added  in  Figure  3.  It  is  assumed 
that  the  error  is  multiplied  by  a  constant,  K,  then  input  to  the  seeker  drive 

as  a  rate  command,  6  .  It  is  assumed  that  the  seeker's  response  to  this 

command  can  be  represented  by  a  first  order  differential  equation  (first  order 
response  with  time  constant  t).  This  is  still  a  considerable  simplification 
of  course.  The  commanded  rate  will  be  constant  throughout  the  interval 
between  data  samples.  The  seeker  rate  and  position  are  uniquely  defined  by  the 
solution  of  the  differential  equation  and  insertion  of  the  appropriate  time 
as  shown  in  Figure  3.  Obviously  the  time  of  interest  is  the  time  of  taking  a 
new  data  sample.  This  solution,  shown  in  Figure  3,  involves  an  exponential 
which  is  typically  time  consuming  to  evaluate,  but  for  a  constant  sampling 
interval  and  a  constant  time  constant  it  needs  to  be  evaluated  only  once,  at 
problem  initiation-  Figure  3  also  shows  the  numerical  burden  of  calculating  this 
response  numerically  with  a  very  simple  algorithm.  It  can  be  seen  by  inspection 
that  the  numerical  burden  is  exactly  equal  for  the  two  solutions,  per  step. 

But  to  achieve  acceptable  accuracy  it  would  typically  be  required  to  break  the 
sampling  interval  into  sub-intervals  for  the  case  of  numerical  integration, 
even  if  a  sophisticated  integration  algorithm  were  used.  The  number  of 
subintervals  would  depend  on  the  ratio  of  the  sampling  interval  to  the  time 
constant. 

For  the  analytical,  i.e.,  exact, solution  there  is  no  error,  obviously. 

5.  ON  APPLICATION.  Most  of  the  so-called  detail  which  we  insert  into 
missile  simulations  consists  of  response  functions  (transfer  functions)  which 
represent  differential  equations  with  constant  coefficients,  the  solutions  to 
which  are  well  known  and  catalogged.  For  the  sake  of  computation  speed 
these  should  not  be  treated  the  same  as  the  other  differential  equations  which 
cannot  be  solved  exactly.  The  analytical  solutions  should  be  taken  advantage  of. 

In  the  sampled-data  seeker  case  chosen  for  illustration  the  application  is 
straight  forward.  Moving  into  the  control  system,  not  shown  in  detail, 
the  situation  is  more  complicated  because  the  data  flow  becomes  analog,  and 
the  concept  of  "sampling  interval"  is  lost.  Still,  to  do  this  simulation  digitally 
some  time  interval  (or  intervals)  is  chosen  by  which  to  advance  the  solution. 

Use  can  still  be  made  of  exact  solutions  in  many  cases. 
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REPRESENATIVE  PROBLEM 
MISSILE  FLIGHT  SIMULATION 
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FIGURE  1 


DIGITAL  SIMULATION  APPROACH 
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FUNCTIONAL  REPRESENTATION 
PASSIVE  HOMING 


FIGURE  2 


SAMPLED  DATA  SEEKER 
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THE  COMPUTATIONAL  LOAD  IS  EQUAL,  FOR  THIS  CASE 


A  STRATEGY  FOR  INTEGRATING  PROGRAM  TESTING  AND  ANALYSIS 


Leon  Osterweil 
University  of  Colorado 
Boulder,  Colorado 


ABSTRACT.  This  paper  presents  a  view  of  how  the  tech¬ 
niques  of  static  analysis  and  dynamic  program  testing  can  be 
combined  and  integrated  into  a  tool  supported  methodology 
which  smoothly  incorporates  the  best  features  of  each.  The 
paper  is  composed  of  two  major  components.  The  first  is 
more  general  and  descriptive.  In  it  the  central  importance 
of  dynamic  testing  by  means  of  programmer  generated  asser¬ 
tions  is  stressed  first,  and  some  remarks  about  tool  support 
for  assertion  testing  are  made.  Various  weaknesses  of 
dynamic  testing  are  then  remarked  upon,  motivating  the 
desirability  of  using  static  analysis  as  well.  The  general 
characteristics  of  static  analysis,  and  especially  data  flow 
analysis  are  described  next.  Static  analysis  is  then 
described  as  a  technique  for  making  certain  kinds  of  dynamic 
testing  more  efficient  and  trustworthy.  Symbolic  execution 
and  formal  verification  are  presented  next  and  described  as 
logical  and  important  components  of  the  integrated  system 
being  described.  The  second  component  of  the  paper  deals 
with  TOOLPACK,  an  integrated  ensemble  of  tools  of  the  types 
described  in  the  first  component.  The  architecture  and  high 
level  design  of  TOOLPACK  are  described,  and  some  implementa¬ 
tion  plans  are  presented. 
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I.  INTRODUCTION .  Software  engineering  is  a  discipline 
which  has  recently  been  experiencing  a  period  of  consider¬ 
able  but  unstructured  growth.  It  now  shows  signs  of  embark¬ 
ing  upon  a  phase  of  coordination  and  consolidation.  There 
has  been  a  large  amount  of  work  devoted  to  the  development 
of  softwre  engineering  tools.  This  seems  to  be  particularly 
promising  work,  as  tools  are  vehicles  for  capturing  software 
engineering  concepts  in  a  way  which  is  tangible  and  useful 
to  software  practitioners.  Through  well-implemented  tools, 
desirable  policies  can  be  promulgated  and  enforced 
throughout  a  project,  in  a  way  which  increases  the  coordina¬ 
tion  and  efficiency  of  that  project. 

In  the  past,  the  quality  of  tools  produced  has  been 
spotty.  Worse,  however,  the  goals  of  most  tools  and  the 
domains  of  their  efficacy  have  rarely  been  clearly  enunci¬ 
ated.  As  a  consequence,  it  has  been  difficult  for  the  com¬ 
munity  of  software  practitioners  to  select  tools  appropri¬ 
ate  for  facilitating  work  on  the  specific  tasks  comprising 
their  software  development  activities.  Thus  specification 
of  the  goals  and  domains  of  efficacy  of  a  tool  should  be  an 
important  part  of  its  documentation.  The  availability  of 
such  specif ications  should  enable  practitioners  to  intelli¬ 
gently  select  and  configure  a  set  of  tools  into  an  environ¬ 
ment  capable  of  supporting  specific  software  production 
activities. 

In  this  paper  we  propose  a  generic  configuration  of 
tool  capabilities  and  an  approach  to  building  an  integrated 
system  of  such  tools,  called  TOOLPACK. 
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II.  CLASS  ONE  -  DYNAMIC  TESTING  AND  ANALYSIS  TOOLS . 
The  terms  dynamic  testing ~  and  dynamic  analysis,  as  used 
here,  are  intended  to  describe  most  of  the  systems  known  as 
execution  monitors,  software  monitors  and  dynamic  debugging 
systems  ([Ralz  69],  [Fair  75],  [stuc  75]  and  [Gris  70])* 

In  dynamic  testing  systems,  a  comprehensive  record  of  a 
single  execution  of  a  program  is  built-  This  record  —  the 
execution  history  --  is  usually  obtained  by  instrumenting 
the  source  program  with  code  whose  purpose  is  to  capture 
information  about  the  progress  of  the  execution.  Most  such 
systems  implant  monitoring  code  after  each  statement  of  the 
orogram ,  This  code  captures  such  information  as  the  number 
of  the  statement  just  executed,  the  names  of  those  variables 
whose  values  had  been  altered  by  executing  the  statement, 
the  new  values  of  these  variables,  and  the  outcome  of  any 
tests  performed  by  the  statement.  The  execution  history  is 
saved  in  a  file  so  that  after  the  execution  terminates  it 
can  be  perused  by  the  tester.  This  perusal  is  usually 
facilitated  by  the  production  of  summary  tables  and  statis- 
tics  such  as  statement  execution  frequency  histograms,  and 
variable  evolution  trees  . 

Despite  the  existence  of  such  tables  and  statistics,  it 
is  often  quite  difficult  for  a  human  tester  to  detect  the 
source  or  even  the  presence  of  errors  in  the  execution. 
Hence,  many  dynamic  testing  systems  also  monitor  each  state¬ 
ment  execution  checking  for  such  error  conditions  as  divi¬ 
sion  by  zero  and  out-of-bounds  array  references.  The  moni¬ 
tors  implanted  are  usually  programmed  to  automatically  issue 
error  messages  immediately  upon  detecting  such  conditions  in 
order  to  avoid  having  the  errors  concealed  by  the  bulk  of  a 
large  execution  history. 

Some  of  this  can  be  exemplified  with  the  aid  of  a  sim¬ 
ple  minded  program*  Figure  1  shows  a  program  whose  purpose 
is  to  produce  the  areas  of  rectangles  and  triangles  having 
integer  dimensions,  when  the  dimensions  are  given  as  input* 
The  program,  a  procedure  called  area ,  is  divided  into  two 
major  functional  portions.  One  function,  implemented  by 
procedure  lookup,  returns  the  area  of  the  triangle  or  rec¬ 
tangle  by  using  a  table  lookup.  The  two  dimensions  input 
for  the  object  are  used  as  the  first  two  indices  into  the 
table,  a  three-dimensional  array,  A.  If  the  area  of  a  rec¬ 
tangle  is  desired,  the  value  1  must  be  input  with  the  dimen¬ 
sions,  a  value  2  indicates  the  area  of  a  triangle  is 
desired.  A  value  0  causes  the  lookup  loop  to  terminate. 
The  value  1  or  2  is  used  as  the  third  indexing  coordinate 
into  array,  A. 

Array  A  is  initialized  by  the  second  functional  portion 
of  the  program  implemented  by  the  procedure  inij:*  This  pro¬ 
cedure  initializes  A  in  a  somewhat  indirect  way,  perhaps 
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motivated  by  an  interest  in  eliminating  the  need  for  multi¬ 
plications. 

In  Figure  7  we  see  the  same  program  augmented  by  the 
code  necessary  to  monitor  for  two  types  of  errors  —  divi¬ 
sion  by  zero  and  out  of  bounds  array  reference.  This 
monitor-augmented  program  is  typical  of  the  code  which  would 
be  generated  automatically  by  a  straightforward  dynamic  test 
tool.  The  monitors  are  positioned  so  as  to  assure  that  any 
occurrence  of  either  of  the  two  errors  will  be  detected 
immediately  before  it  would  occur  in  the  actual  execution  of 
the  program.  To  a  human  observer  it  is  obvious  that  many  of 
these  probes  are  redundant.  We  shall  be  very  much  concerned 
with  studying  the  forms  of  automated  analysis  necessary  to 
suppress  such  probes . 

Some  systems  ([Fair  75],  [Stuc  75])  additionally  allow 
the  human  program  tester  to  create  his  own  monitors  and 
direct  their  implantation  anywhere  within  the  program. 

The  greatest  power  of  these  systems  is  derived  from  the 
possibility  of  using  them  to  determine  whether  a  program 
execution  is  proceeding  as  intended.  The  intent  of  the  pro¬ 
gram  is  captured  by  sets  of  assertions  about  the  desired  and 
correct  relation  between  values  of  program  variables.  These 
assertions  may  be  specified  to  be  of  local  or  global  vali¬ 
dity.  The  dynamic  testing  system  creates  and  places  moni¬ 
tors  as  necessary  to  determine  whether  the  program  is  behav¬ 
ing  in  accordance  with  asserted  intent  as  execution 
proceeds . 

Figure  3  shows  how  the  example  program  might  be  anno¬ 
tated  with  assertions.  These  assertions  are  designed  to 
capture  the  intent  of  the  program  and  explicitly  state  cer¬ 
tain  non-trivial  error  conditions,  to  which  this  program 
seems  particularly  vulnerable.  Figure  4  shows  how  the  code 
of  Figure  1  might  be  augmented  in  order  to  test  dynamically 
for  adherence  to  or  violation  of  the  assertions  shown  in 
Figure  3.  It  should  be  clear  from  this  example  that  dynamic 
assertion  verification  offers  the  possibility  of  very  mean¬ 
ingful  and  powerful  testing.  With  this  technique,  the  tes¬ 
ter  can  in  a  convenient  notation  specify  the  precise  desired 
functional  behavior  of  the  program  (presumably  by  drawing 
upon  the  program's  design  and  requirements  specifications). 
Every  execution  is  then  tirelessly  monitored  for  adherence 
to  these  specifications.  This  sort  of  testing  obviously  can 
focus  on  the  most  meaningful  aspects  of  the  program  far  more 
sharply  than  the  more  mechanical  approaches  involving  moni¬ 
toring  only  for  violations  of  certain  standards  such  as  zero 
division  or  array  bounds  violation. 

From  the  preceding  discussion  it  can  be  seen  that 
dynamic  testing  is  a  powerful  technique  for  detecting  the 
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presence  of  errors.  Because  its  results  are  applicable  only 
to  a  single  execution,  however,  it  cannot  be  used  in  any 
obvious  way  to  effectively  demonstrate  the  absence  of 
errors.  Thus,  it  is  not  by  itself  an  appropriate  technique 
for  verification  (i.e.,  the  process  of  showing  that  a  pro¬ 
gram  necessarily  behaves  as  intended).  Furthermore, 
although  the  assertions  used  for  dynamic  verification  may 
themselves  be  valuable  documentation  of  intent,  dynamic 
testing  does  not  itself  create  useful  documentation  of  the 
nature  of  the  program  itself.  Finally  it  is  important  to 
observe  that  the  benefits  of  dynamic  testing  can  only  be 
derived  as  the  result  of  heavy  expenditures  of  machine 
storage  and  execution  time. 
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III.  CLASS  TWO  -  STATIC  ANALYSIS  TOOLS.  In  the 
category  of  static "analysis  tools,  we  include  all  programs 
and  systems  which  infer  results  about  the  nature  of  a  pro¬ 
gram  from  consideration  and  analysis  of  a  complete  model  of 
some  aspect  of  the  program.  An  important  characteristic  of 
such  tools  is  that  they  do  not  necessitate  execution  of  the 
subject  program  yet  infer  results  applicable  to  all  possible 
executions . 

A  very  straightforward  example  of  such  a  tool  is  a  syn¬ 
tax  analyzer.  With  this  tool  the  individual  statements  of  a 
program  are  examined  one  at  a  time.  At  the  end  of  this  scan 
it  is  possible  to  infer  that  the  program  is  free  of  syntac¬ 
tic  errors. 

A  more  interesting  example  is  a  tool  such  as  FACES 
[Rama  75]  or  RXVP  [Mill  74]  which  performs  a  variety  of  more 
sophisticated  error  scans.  These  tools  both,  for  example, 
perform  a  scan  to  determine  whether  all  procedure  invoca¬ 
tions  are  correctly  matched  to  the  corresponding  defini¬ 
tions.  The  lengths  of  corresponding  argument  and  parameter 
lists  are  compared,  and  the  corresponding  individual  parame¬ 
ters  and  arguments  are  also  compared  for  type  and  dimen¬ 
sionality  agreement.  By  comparing  every  procedure  invoca¬ 
tion  with  its  corresponding  definition  in  this  way  it  is 
possible  to  assure  that  the  program  is  free  of  any  possi¬ 
bility  of  such  a  mismatch  error.  Mote  that  this  analysis 
requires  no  program  execution,  yet  produces  a  result  appli¬ 
cable  to  all  possible  executions.  This  sort  of  analysis, 
requiring  a  comparison  of  combinations  of  statements,  can 
also  be  used  to  demonstrate  that  a  program  is  free  of  such 
defects  as  illegal  type  conversions,  confusion  of  array 
dimensionality,  superfluous  labels  and  missing  or  uninvoked 
procedures  * 

Data  flow  analysis  is  a  still  more  sophisticated  form 
of  static  analysis  which  is  based  upon  consideration  of 
sequences  of  events  occurring  along  the  various  paths 
through  a  program.  As  such  it  is  capable  of  more  powerful 
analytic  results  than  combinational  scans  such  as  those  just 
described.  The  DAVE  System  [Oste  76],  [Fosd  76]  is  a  good 
example  of  such  a  tool.  This  system  examines  all  paths  ori¬ 
ginating  from  the  start  of  a  FORTRAN  program  and  is  capable 
of  determining  that  no  path,  when  executed,  will  cause  a 
reference  to  an  uninitialized  variable.  DAVE  also  examines 
all  paths  originating  from  a  variable  definition  and  is 
capable  of  determining  whether  or  not  there  is  a  subsequent 
reference  to  the  variable.  A  definition  not  subsequently 
referenced  is  called  a  "dead"  definition.  Hence  DAVE  is 
also  capable  of  showing  that  a  Fortran  program  is  free  of 
dead  variable  definitions . 
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Data  flow  analysis  is  based  upon  examination  of  a  flow 
graph  model  of  the  subject  program*  The  flow  graph  of  every 
program  unit  is  created  and  its  nodes  are  annotated  with 
descriptions  of  the  uses  of  all  variables  at  all  nodes. 
Nodes  representing  procedure  invocations  cannot  be  annotated 
in  this  way  immediately.  Figure  5  shows  the  collection  of 
three  annotated  flowgraphs  which  would  be  created  to 
represent  the  variable  usage  by  the  statements  of  the  exam- 
pie  program  of  Figure  1*  Procedures  such  as  init  and  lookup 
which  invoke  no  others  are  completely  annotated.  For  such 
procedures  a  data  flow  analyzer  like  DAVE  would  determine 
the  presence  or  absence  of  uninitialized  variable  references 
and  dead  variable  definitions.  This  can  be  done  by  using 
data  flow  analysis  algorithms  such  as  LIVE  and  AVAIL  [Hech 
75]  to  efficiently  determine  the  usage  patterns  of  the  pro¬ 
gram  variables  along  the  paths  leading  into  or  out  of  a  pro¬ 
gram  node.  Having  done  this,  it  is  possible  to  complete  the 
data  flow  analysis  of  the  main  program*  The  details  of  this 
procedure  can  be  found  in  [Fosd  761 . 

In  summary  we  have  seen  that  static  analysis  can  be 
used  to  determine  the  presence  or  absence  of  certain  classes 
of  errors  and  to  produce  certain  kinds  of  program  documenta¬ 
tion.  Hence  it  is  useful  as  an  adjunct  to  a  testing  pro¬ 
cedure  and  offers  weak  verification  capabilities.  It  is 
also  useful  in  supplying  limited  forms  of  documentation 
(e.g.,  the  input/output  behavior  or  a  procedure's  parameters 
and  global  variables) *  There  is  currently  ongoing  research 
which  indicates  that  static  analysis,  particularly  data  flow 
analysis,  can  be  used  to  both  verify  and  test  for  wider 
classes  of  errors,  as  well  as  to  produce  additional  forms  of 
documentation  (e.g.,  [Tayl  80]). 

Of  particular  interest  to  us  here  is  the  possibility  of 
using  static  data  flow  analysis  to  suppress  certain  of  the 
probes  generated  by  dynamic  assertion  verification  tools  as 
part  of  a  comprehensive  test  procedure.  As  noted  earlier, 
many  of  these  probes  generated  by  dynamic  test  aids  are 
redundant.  Their  presence  adds  to  the  size  and  execution 
time  of  a  test  run  yet  has  no  diagnostic  value*  Hence  an 
automatic  procedure  which  removes  them  makes  testing  more 
efficient.  It  also  serves  to  focus  attention  on  the  impor¬ 
tance  of  exercising  the  remaining  probes.  Sometimes  it  is 
possible  to  remove  all  the  probes  generated  by  an  assertion 
or  single  error  criterion.  In  this  case,  it  has  been  de 
facto  demonstrated  that  the  error  being  tested  for  cannot 
occur,  and  this  aspect  of  the  program's  behavior  has  been 
verified.  This  perspective  shows  how  testing  and  verifies- 
tion  activities  can  be  coordinated  with  each  other. 

For  a  specific  example  of  this,  let  us  examine  the  pro¬ 
gram  in  Figure  2.  We  will  demonstrate  how  the  three  static 
analysis  approaches  -  line-by-line,  combinational  and  data 
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flow  -  can  remove  progressively  more  error  probes.  It  is 
perhaps  illuminating  to  observe  that  what  is  being  contem¬ 
plated  here  is  actually  code  optimization  in  the  classical 
sense  (e.g.,  see  [Alle  7  6],  [  Scha  7  3!]-  We  are  attempting  to 
identify  and  remove  redundant  code  in  some  cases  and  to  move 
code  to  more  advantageous  positions  in  other  cases.  Even 
the  techniques  employed  are  directly  derivative  from  optimi¬ 
zation  techniques. 

A  straightforward  line-by-line  scan  of  the  program  in 
Figure  2  will  suffice  to  remove  several  test  probes. 
Clearly  the  inequality  tests  in  statements  E2,  E6,  and  E9 
must  always  be  true.  Hence  no  more  sophisticated  analysis 
is  needed  to  justify  the  removal  of  these  probes. 

A  combinational  examination  of  contiguous  sequences  of 
tests  can  eliminate  other  probes.  For  example,  E4  and  E7 
contain  identical  tests,  without  any  intervening  flow  of 
control  or  test  variable  alteration.  Hence  one  of  the  tests 
can  be  removed.  Similarly,  either  E10  or  E13  can  be 
removed,  and  either  Ell  or  El 4  can  be  removed.  This  sort  of 
probe  removal  is  based  upon  analysis  that  is  quite  similar 
to  "peephole  optimization"  [Scha  73]. 

Additional  probe  removal  can  be  justified  by  data  flow 
analysis  arguments.  This  analysis  could  be  used  to  remove 
the  test  probes  at  E4  and  E7,  as  well  as  the  probes  at  E19 
and  E22.  It  should  be  noted  that  this  analysis  is  more 
powerful  than  the  combinational  analysis  outlined  above,  and 
thus  capable  of  justifying  the  removal  of  the  probes  named 
earlier.  Some  insight  into  procedures  for  some  removal  of 
such  probes  can  be  found  in  [Oste  77]  and  TBoll  79]. 

Static  analysis  can  also  be  used  to  justify  the  dele¬ 
tion  of  certain  probes  inserted  in  response  to  assertions . 
Note  that  assertion  A1  in  Figure  3  expands  to  probe  state¬ 
ments  P 1 , 1 ;  PI , 2 ;  PI, 3;  PI, 4;  and  PI, 5.  Assertion  A4  also 
expands  to  5  probes  in  the  program  in  Figure  4.  All  of 
these  probes  could  be  avoided  if  a  static  scan  were  used 
first  to  determine  which  (if  any)  of  the  procedure  parame¬ 
ters  were  used  as  outputs  by  the  procedure. 

In  this  case  static  analysis  can  be  used  to  remove  all 
probes  resulting  from  an  assertion.  Hence  verification  of 
the  assertion  can  be  achieved.  On  the  other  hand,  we  saw 
that  many,  but  not  all,  of  the  subscript  range  checking 
probes  can  be  removed  by  static  analysis.  We  shall  shortly 
show  that  some  additional  probes  can  be  removed  by  using 
symbolic  execution  and  constraint  solving. 

We  have  thus  shown  that  there  are  significant  assertion 
types  and  error  categories  which  can  be  completely  verified 
through  static  analysis.  It  seems  important  to  determine 
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which  other  assertion  types  and  error  categories  give  rise 
to  probes  which  can  be  partially  or  totally  removed  by 
static  analysis.  This  is  currently  an  open  research  area. 
It  is  clear,  however,  that  assertions  of  functional  equality 
such  as  A2  and  A3  are  beyond  easy  verification  by  static 
analysis.  Furthermore,  the  removal  of  subscript  range  test 
probes  involving  functions  of  test  variables  (e.g.,  1  <=  J-1 
<=  20  in  E8)  seems  to  require  either  a  set  of  special  case 
static  analyses  or  a  different  more  general  form  of 
analysis.  We  discuss  such  a  different  type  of  analysis 
next . 
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IV.  CLASS  THREE  -  SYMBOLIC  EXECUTION  TOOLS .  By  sym¬ 
bolic  execution we  ’  mean’  the  process  of  'computing  the  values 
of  a  program's  variables  as  functions  which  represent  the 
sequence  of  operations  carried  out  as  execution  is  traced 
along  a  specific  path  through  the  program.  If  the  path  sym¬ 
bolically  executed  is  a  path  from  a  procedure  start  node  to 
an  output  statement,  then  the  symbolic  execution  will  show 
the  functions  by  which  all  of  the  output  values  are  com¬ 
puted.  The  only  unknowns  in  these  functions  will  be  the 
input  values  (either  parameters  in  the  case  of  an  invoked 
procedure  or  read-in  values  when  a  main  program  is  being 
symbolically  executed). 

Thus  for  example,  suppose  we  symbolically  execute  the 
path  1,  2,  32,  3,  4,  5,  6,  7,  8,9,  10,  11,  10,  11  in  the 
program  shown  in  Figure  1.  At  node  8  the  value  of  i  will  be 
given  by  "1",  and  the  value  of  A(l,l,l)  will  also  be  given 
by  "1".  After  node  10  has  been  executed  the  first  time,  the 
value  of  j  will  be  given  by  "2",  A(l,2,l)  will  be  given  by 
"1  +  1”.  The  next  time  node  10  is  symbolically  executed  j 
will  be  "3"  and  A(l,3,l)  will  be  "1  +  1  +1."  If  the  path  8, 
9,  10,  11,  10  is  symbolically  executed,  then  when  node  8  is 
reached  the  value  of  i  will  be  an  unknown  and  hence 
represented  by  "i".  The  value  of  A(i,l,l)  will  likewise  be 
represented  by  "i".  When  node  10  is  reached  for  the  first 
time  j  will  receive  the  value  "2"  and  A(i,2,l)  will  receive 
the  value  "i  +  i . "  Similarly,  the  next  time  node  10  is 
reached  j  will  receive  the  value  "3"  and  A(i,3,l)  will 
receive  the  value  "i  +  i  +  i". 

A  small  number  of  symbolic  execution  tools  has  been 
built  [Howd  78],  [King  76],  [Clar  76].  These  tools  mechan¬ 
ize  the  creation  of  the  formulas  and  maintain  incremental 
symbol  tables.  They  employ  formula  simplification  heuris¬ 
tics  in  an  attempt  to  forestall  the  growth  in  size  of  the 
generated  formulas  and  foster  recognition  of  the  underlying 
functional  relations.  (It  should  be  noted,  however,  that 
these  simplifiers  do  not  take  roundoff  error  into  account 
and,  therefore,  may  misrepresent  the  actual  function  com¬ 
puted  by  a  sequence  of  floating-point  computations) .  Hence 
a  symbolic  execution  tool  would  report  the  value  of  A(i,  3, 
1)  after  two  iterations  of  the  loop  at  node  9  to  be  "3  *  i." 

The  foregoing  discussion  strongly  indicates  that  sym¬ 
bolic  execution  is  an  excellent  technique  for  documenting  a 
program.  Symbolic  traces  provide  documentation  of  the 
actual  functioning  of  a  program  along  any  specific  path.  In 
order  to  use  symbolic  execution  as  a  technique  for  testing 
and  verification  however,  it  is  necessary  to  augment  the 
technique  with  a  constraint  solving  capability. 

In  order  to  clarify  this,  let  us  begin  by  observing 
that  the  above  described  functional  behavior  occurs  only 
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when  the  given  path  is  executed.  In  general,  however,  a 
given  program  can  execute  an  (often  infinite)  variety  of 
paths,  depending  upon  the  program's  input  values.  The  con¬ 
ditions  under  which  a  given  path  is  executed  can  often  be 
determined  by  symbolic  execution  and  constraint  solution. 
Consider  the  program  given  in  Figure  1,  as  represented  by 
the  flowgraph  in  Figure  5.  Each  edge  of  the  flowgraph  can 
be  labeled  by  a  predicate  describing  the  conditions  under 
which  the  edge  will  be  traversed.  Thus  for  example,  the 
edge  (7,8)  is  labeled  "h  >  1",  the  edge  (9,10)  is  labeled  "b 
>  2",  (5,6)  is  labeled  "h  <  20"  and  edge  (11,10)  is  labeled 
"j  <  b"  (note  that  node  11  is  assumed  to  represent  the  loop 
incrementation  and  termination  test  operations).  Sequential 
control  flow  edges  such  as  (8,9)  and  (10,11)  are  labeled  by 
the  predicate  "true."  Now  clearly  a  given  path  will  be  exe¬ 
cuted  if  and  only  if  all  of  the  predicates  attached  to  all 
of  the  path  edges  are  satisfied.  Unfortunately,  a  simple 
textual  scan  will  express  these  constraints  only  in  terms  of 
the  variables  within  the  statements.  Thus  the  constraints 
will  in  general  not  show  their  underlying  interrelations. 
If  the  constraints  are  expressed  in  terms  of  the  formulas 
derived  through  symbolic  execution  of  the  path,  then  a  set 
of  constraints  all  expressed  in  terms  of  the  program' s  input 
values  is  obtained.  Any  solution  of  this  set  of  constraints 
is  a  set  of  input  values  sufficient  to  force  execution  of 
the  given  path. 

It  is  important  to  observe  that  some  constraint  systems 
are  unsatisf iable ,  indicating  that  the  path  spawning  them  is 
unexecutable.  We  shall  mahe  important  use  of  this  shortly. 
No  less  important  is  the  observation  that  the  problem  of 
determining  a  solution  to  an  arbitrary  system  of  constraints 
is  in  general  unsolvable.  Hence  we  must  not  expect  that 
this  potentially  useful  capability  can  be  infallibly  imple¬ 
mented  . 

Experimentation  has  indicated,  however,  that  for  an 
important  class  of  programs  the  constraints  actually  gen¬ 
erated  are  quite  tractable  [Clar  76]. 

Testing  and  verification  capabilities  can  be  achieved 
by  attempting  to  solve  constraints  embodying  error  condi¬ 
tions  and  statements  of  intent.  Thus,  for  example,  if  we 
create  a  predicate  constraining  the  subscript  i  to  be  "i  < 
1"  at  statement  8,  we  are  specifying  an  out-of-bounds  array 
reference  error.  This  constraint  is  clearly  inconsistent 
with  the  constraint  "i  >  1"  attached  to  edge  (7,8).  Hence 
it  is  impossible  for  the  first  array  subscript  at  statement 
8  to  be  below  bounds.  Hence  we  have  shown  that  one  of  the 
tests  generated  in  Figure  2  is  superfluous.  A  symbolic  exe¬ 
cution  of  a  path  from  node  1  through  node  8  will  similarly 
show  that  testing  i  against  20  is  superfluous  for  that  path. 
The  dynamic  test  for  that  error  condition  can  be  safely 
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removed  if  it  is  shown  that  all  paths  through  node  8  must 
create  constraints  inconsistent  with  "i  >  20."  In  this  exam¬ 
ple  that  is  the  case  because  procedure  init  does  not  alter 
the  value  of  h  and  init  is  always  invoked  with  h  =  20. 
These  facts  can  be  inferred  from  static  analysis.  Hence  a 
combination  of  static  analysis,  symbolic  execution  and  con¬ 
straint  solution  can  be  used  to  eliminate  statement  El  of 
Figure  2.  Similar  arguments  can  be  used  to  eliminate  state¬ 
ments  E4 ,  E7 ,  E5 ,  E8 ,  E10,  Ell,  E12,  E13,  E14,  E15,  E19  and 
E22 . 


Statements  E8  and  El 5  are  particularly  interesting.  It 
could  be  argued  that  static  analysis  is  sufficient  to  elim¬ 
inate  these  subscript  checking  probes  as  well .  The  sub¬ 
scripts  being  checked  here,  however,  are  functions  of  pro¬ 
gram  variables.  Surely  static  analysis  rules  could  be  dev¬ 
ised  for  each  of  these  situations,  but  other  rules  would 
have  to  be  devised  for  other  common  occurrences.  The  result 
would  be  an  inelegant  mass  of  special  procedures.  A  sym¬ 
bolic  trace,  on  the  other  hand,  easily  shows  all  functional 
relations,  and  readily  expresses  the  needed  range  checking 
tests  directly  in  terms  of  the  input  values.  Thus  the  sym¬ 
bolic  execution/constraint  solving  approach  provides  an 
elegant  technique  which  avoids  the  need  for  the  inelegant 
special-cases  approach. 

It  is  important  to  note  that  we  have  analytically  jus¬ 
tified  the  removal  of  virtually  all  subscript  checking 
probes  from  the  program  in  Figure  2.  In  particular,  all 
probes  inserted  to  check  the  subscripts  of  statements  8,  10 
and  17  can  be  removed.  Hence  we  have  verified  that  these 
statements  correctly  reference  array  A. 

Although  statement  El 6  is  a  probe  for  a  different  error 
(division  by  zero)  it  should  be  apparent  that  the  analytic 
technique  just  described  can  be  used  to  show  that  the  test 
embodied  in  E16  is  also  unnecessary.  This  error  condition 
is  expressed  as  the  constraint  "xk=0."  This  will  be  incon¬ 
sistent  with  any  constraint  set  arising  from  symbolic  execu¬ 
tion  of  a  path  through  node  14.  Yet  static  analysis  will 
show  that  node  14  must  always  be  executed  prior  to  node  El 6. 
Hence  it  is  verified  that  the  division  in  statement  18  is 
always  well  defined. 

Probes  E17,  E18,  E20  and  E21  cannot  be  removed,  how¬ 
ever.  In  fact  symbolic  execution  of  a  path  such  as  34,  35, 
36,  21,  22,  23,  24,  25  yields  only  the  following  con¬ 
straints  :  [  1  ] 


[l]The  notation  (T)  should  be  read  as  "the  ith  value  taken 
as  input,  to  this  path."  Hence  in  this  case  (5)  means  "the 
third  value  read  in." 
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©  *  0  (from  edge  (35,36)) 

Q)  =  1  (from  edge  (24,25)) 

Thus  clearly  When  statement  25  is  encountered (5)  is  con¬ 
strained  to  be  1,  but  (T)  and  (2)  are  subject  to  no  constraints. 
An  out-of-bounds  subscript  error  at  statement  25  could  be 
simulated  by  any  of  the  constraints  i<l,  i>20,  j<l,  or  j>20. 
After  symbolic  execution  these  become  (l)  <1,  (l)  >20,  (2)  <1  and 
(?)  >20.  None  of  those  is  consistent  with  the  constraints 

generated  by  consideration  of  path  edges.  Hence  a  solution 
such  as 


can  clearly  force  execution  of  an  array  subscript  reference 
error  at  statement  25.  Thus  we  see  that  the  symbolic 
execution/ constraint  solving  technique  is  a  powerful  testing 
aid.  It  should  be  noted  that  the  ATTEST  system  tClar  76] 
implements  most  of  the  capabilities  just  described. 

Perhaps  the  most  important  use  of  symbolic 
execution/constraint  solution  is  as  a  technique  for  verify¬ 
ing  assertions  of  functional  relations  between  program  vari¬ 
ables.  At  the  end  of  the  previous  section  it  was  noted  that 
verification  of  assertions  such  as  A2,  A3,  A5  and  A6  is 
beyond  the  power  of  the  static  analyzers  which  had  been 
presented.  We  saw  that  static  analysis  is  quite  adept  at 
inferring  all  the  possible  sequences  of  events  which  might 
arise  during  execution  of  a  program,  and  that  by  comparing 
these  with  specifications  of  correct  and  incorrect 
sequences,  testing  and  verification  capabilities  are 
obtained.  When  the  statements  of  correct  behavior  are 
couched  as  predicates  involving  program  variables,  however, 
symbolic  execution/constraint  solution  is  most  useful .  This 
is  not  surprising,  as  symbolic  execution  is  a  technique  for 
tracing  and  manipulating  the  functional  relations  between 
program  variables. 

Ue  have  already  discussed  the  fact  that  the  subscript 
references  at  statements  25  and  27  may  cause  array  bounds 
violations.  This  was  determined  by  using  symbolic 
execution/constraint  solution  to  demonstrate  that  probes 
P5,l  and  P6,l  are  not  inconsistent  with  path  induced  con¬ 
straints.  Thus  they  cannot  safely  be  removed  and  assertions 
A5  and  A6  cannot  be  verified. 

On  the  other  hand,  these  techniques  can  help  verify  the 
correctness  of  assertions  A2  and  A3.  By  using  symbolic  exe¬ 
cution  for  the  path  10,  11,  10,  we  obtain  the  relation 

A( i, j , 1 )  =  A( i , j-1 , 1)  +  i 
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Viewing  this  as  a  recurrence  relation  whose  initial  condi¬ 
tion  is  given  by 

A  (  i  ,  1 ,  1 )  =  i 

we  can  obtain  the  analytic  solution 

A  (  i  ,  j  ,  1 )  =  j  *  i 

from  the  theory  of  finite  difference  equations.  This  rela¬ 
tion  is  exactly  the  one  asserted  by  A2 .  Hence  this  asser¬ 
tion  is  analytically  verified  and  need  not  be  dynamically 
verified.  Clearly,  this  capability  rested  heavily  upon 
being  able  to  draw  on  results  from  finite  mathematics. 
Cheatham  has  created  a  tool  with  impressive  inferential 
capabilities  of  this  sort  [Chea  78],  although  the  problem  of 
determining  the  closed  form  of  a  recurrence  is  in  general 
intractable.  Also  required  here  is  the  ability  to  recognize 
when  two  formulas  are  equivalent.  This  problem  is  likewise 
intractible  in  general . 

Additional  pitfalls  of  demonstrating  functional 
equivalence  are  demonstrated  by  assertion  A3.  Here  we 
easily  see  that  symbolic  execution  will  establish  that  after 
statement  17 

A(  i  ,  j  ,  2  )  =  A(.i,j,l)/2*0 

This  is  mathematically  equivalent  to  the  equation 

A(i,j,2)  =  0 . 5*A( i , j , 1 ) , 

and  is  readily  recognized  as  being  equivalent..  Because  of 
the  peculiarities  of  floating  point  hardware,  however,  the 
two  formulas 

A(  .i,  j  ,  1) /?  .  0  and  0.5*A(i,j,l) 

will  often  evaluate  to  different  values.  Hence  the  results 
of  symbolic  verification  and  dynamic  verification  may 
differ . 

Despite  these  various  limitations  we  are  encouraged  to 
believe  that  symbolic  execution/constraint  solution  can  be 
used  to  yield  impressive  documentation,  testing  and  verifi¬ 
cation  capabilities.  Perhaps  these  limitations  can  be  put 
in  better  perspective  by  observing  that  symbolic  execution 
and  constraint  solution  are  the  basic  techniques  used  in 
formal  verification  or  so  called  "proof  of  correctness" 
([Elsp  72],  [Lond  75],  [Hant  76]). 

In  formal  verification  the  intent  of  a  program  must  be 
captured  totally  by  assertions  imbedded  according  to  the 
dictates  of  a  criterion  such  as  the  Floyd  Method  of 
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Inductive  Assertions  fFloy  67].  The  correctness  verifica¬ 
tion  is  established  by  symbolically  executing  all  code 
sequences  lying  between  consecutive  assertions  and  showing 
that  the  results  obtained  are  consistent  with  the  bounding 
assertions.  The  consistency  demonstration  is  generally 
attempted  by  using  predicate  calculus  theorem  provers  rather 
than  constraint  solvers  as  discussed  here. 

It  is  crucial  to  observe,  that  these  theorem  provers 
are  subject  to  the  same  theoretical  limitations  discussed 
earlier.  The  undecidability  of  the  First  Order  Predicate 
Calculus  makes  it  impossible  to  be  sure  whether  a  theorem  is 
true  or  false.  Hence  we  cannot  be  guaranteed  of  an  answer 
to  the  question  of  whether  a  symbolic  execution  will  yield 
results  consistent  with  its  bounding  assertions.  Further¬ 
more,  the  symbolic  execution  may  make  simplifications  and 
transformations  of  real  formulas  which  do  not  recreate  the 
functioning  of  floating  point  hardware.  These  and  similar 
limitations  of  formal  verification  have  long  been  ack¬ 
nowledged.  Yet  still  formal  verification  is  rightly 
regarded  as  a  useful  technique  capable  of  increasing  one's 
confidence  in  the  functional  soundness  of  a  program.  This 
is  exactly  the  sense  in  which  the  symbolic 
execution/constraint  solution  technique  just  discussed 
should  be  considered  worthwhile. 

In  fact,  this  technique  is  of  more  worth  to  a  practi¬ 
tioner  than  formal  verification,  because  of  its  flexibility. 
As  already  observed,  formal  verification  requires  a  com¬ 
plete,  exhaustive  statement  of  a  program's  intent.  The 
technique  just  described  focuses  on  attempting  to  justify  or 
disprove  the  validity  of  individual  assertions.  This  gives 
the  practitioner  the  ability  to  probe  various  individual 
aspects  of  his  program  as  he  may  desire.  From  this  perspec¬ 
tive  we  view  formal  verification  as  the  logical,  orderly 
culmination  of  a  process  of  verifying  progressively  more 
complete  assertion  sets.  This  culmination  is  rarely  reached 
due  to  its  prohibitive  costs . 
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V.  A  STRATEGY  FOR  INTEGRATING  TOOL  CAPABILITIES .  In 
this  section  we  propose  some  ways  in  which  the  preceding 
classes  of  tools  can  be  combined  to  address  important 
software  implementation  objectives.  It  seems  that  in  creat¬ 
ing  software  the  overriding  goal  is  to  create,  a  product 
which  demonstrably  meets  its  current  objectives  and  shows 
promise  of  being  adaptable  to  meet  foreseeable  changes  in 
the  objectives.  Much  research  and  experimentation  has  been 
devoted  to  studying  how  to  achieve  this  goal,  and  much  is 
yet  to  be  understood.  From  this  past  work,  however,  a  view 
of  the  software  development  activity  can  be  safely  advanced. 
A  possible  diagram  of  this  view  of  the  software  production 
activity  is  shown  in  Figure  6. 

From  this  diagram  it  is  clear  that  the  activity  should 
be  greatly  facilitated  by  automated  aids  to  documentation, 
testing  and  verification.  The  preceding  sections  have  pro¬ 
vided  a  basis  for  seeing  how  such  automated  aids  can  be 
fashioned  from  a  coalition  of  static  analysis,  symbolic  exe¬ 
cution  and  dynamic  testing  aids.  We  now  propose  some 
details. 

A.  Documentation 

A  complete  set  of  program  documentation  must  fully 
describe  the  structure  and  functioning  of  the  program. 
Clearly  such  a  set  must  describe  a  wide  variety  of  aspects 
of  the  program.  At  present  it  seems  that  certain  of  these 
items  of  description  must  inevitably  be  supplied  by  humans. 
The  previous  sections  of  the  paper  have  shown,  however,  that 
some  documentation  can  be  generated  by  tools.  This  documen¬ 
tation  is,  moreover,  probably  more  reliably  and  cheaply  done 
by  such  tools.  In  addition,  if  some  documentation  is  done 
by  tools,  the  remaining  documentation  is  likely  to  be  done 
more  carefully  by  humans,  thereby  suggesting  the  possibility 
of  greater  quality  and  reliability. 

Earlier  sections  of  this  paper  suggest  that  static 
analysis  tools  should  be  used  first  to  create  such  documen¬ 
tation  as  cross  reference  tables,  variable  evolution  trees, 
and  input/output  descriptions  of  individual  variables  and 
procedures.  Symbolic  execution  tools  can  be  used  next  to 
create  descriptions  of  the  functional  effects  of  executing 
various  paths  through  the  code.  With  constraint  solution,  a 
complete  input/output  characterization  of  the  code  could  be 
obtained.  Performance  character istics  can  be  measured  and 
documented  with  the  aid  of  a  dynamic  testing  tool.  It  is 
proposed  that  all  this  documentation  be  stored  in  a  central 
data  base,  forming  a  skeleton  of  the  complete  documentation. 
Editors  and  interactive  systems  might  be  used  to  gather  from 
humans  such  things  as  text  descriptions  of  variables  and 
procedures . 
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Each  of  the  three  tool  classes  produces  a  different 
kind  of  documentation.  The  types  of  documentation  are  only 
loosely  related,  hence  the  order  of  application  of  the  tools 
can  be  dictated  by  the  importance  of  each  to  the  particular 
project.  It  is  important  to  be  aware,  however,  that  static 
analysis  is  relatively  inexpensive,  symbolic  execution  is 
relatively  expensive,  constraint  solution  is  usually  quite 
expensive,  and  dynamic  testing  can  be  quite  expensive  if 
extensive  elaborate  test  runs  are  done. 

B.  Testing 

In  a  tool-assisted  testing  activity,  the  order  of 
application  of  the  tools  is  important.  We  have  seen  that 
tools  can  be  used  to  focus  the  testing  effort  on  paths  and 
situations  which  appear  to  be  more  error  prone.  This  is 
done  by  elimination  of  probes  which  were  created  to  test  for 
common  programming  errors  and  for  adherence  to  explicit 
assertions.  We  saw  that  many  probes  can  be  removed  by 
application  of  progressively  stronger  (and  more  costly) 
static  analysis.  Some  remaining  probes  may  be  removed  as  a 
result  of  symbolic  execution/constraint  solution.  We  saw 
that  these  probes  are  likely  to  be  the  more  substantive 
ones,  monitoring  for  adherence  to  asserted  functional 
intent.  Their  removal  constitutes  significant  verification, 
but  it  can  be  expected  that  the  cost  of  this  will  be  rela¬ 
tively  high.  Hence  symbolic  execution  should  probably  be 
employed  cautiously  or  not  at  all  as  a  test  aid . 

Finally,  a  dynamic  test  tool  should  be  used  to  gather 
definite  information  about  the  existence  and  sources  of 
error  in  the  program.  As  already  noted,  testing  can  only 
show  the  presence  of  error  in  a  test  case,  and  even  a  simple 
program  may  have  an  infinite  number  of  possible  test  cases. 
Hence  the  tool  aided  procedure  just  outlined  has  added 
importance  in  that  it  helps  suggest  test  cases  -  namely 
those  designed  to  exercise  probes  not  analytically  removed. 

We  have  seen  that  testing  and  verification  can  be 
closely  related  activities.  It  is  important  to  remember, 
however,  that  they  do  differ,  most  noticeably  in  their  goals 
and  placement  in  the  software  production  process .  Testing 
is  the  process  of  looking  for  errors.  It  should  be  viewed 
as  an  activity  which  occurs  frequently  during  code  produc¬ 
tion.  Verification  is  the  process  of  demonstrating  the 
absence  of  errors.  As  such  it  should  not  be  undertaken 
until  and  unless  testing  has  failed  to  uncover  errors.  Thus 
it  is  a  less  frequent,  more  critical  process,  usually  war¬ 
ranting  greater  expense  and  thoroughness.  Our  earlier  dis¬ 
cussion  has  shown  specific  ways  in  which  verification 
results  can  be  obtained  as  outgrowths  of  testing  activities. 
We  have  also  seen,  however,  that  some  activities  provide 
good  verification  results  but  are  likely  to  be  relatively 


571 


costly.  Because  verification  is  a  less  frequent,  more  crit¬ 
ical  activity  the  extra  cost  may  well  be  warranted . 

C.  Verification 

A  verification  activity  should  start  out  like  the  test¬ 
ing  activity  just  described.  The  first  step  is  to  suppress 
error  testing  probes  and  probes  resulting  from  assertions. 
Static  analysis  can  be  used  to  suppress  some  probes,  but  the 
most  significant  probes  probably  can  be  removed  only  by  sym¬ 
bolic  execution.  Verification  is  achieved  on  an  assertion- 
by-assertion  basis  only  when  all  probes  generated  by  a  sin¬ 
gle  assertion  have  been  removed.  In  this  way  stronger  more 
complete  verification  can  be  obtained  incrementally  at 
greater  cost  and  effort.  Complete  formal  verification  can 
be  attempted  if  desired  as  the  culmination  of  this  process. 

A  final  word  should  be  said  about  the  need  for  both 
verification  and  testing.  It  has  been  observed  that  testing 
cannot  demonstrate  the  absence  of  errors.  Hence  verifica¬ 
tion  should  be  attempted.  We  have  also  observed  that  the 
verification  process  has  its  own  risks.  The  most  important 
risk  is  that  an  assertion  verification  attempt  may  end 
inconclusively  because  of  the  failure  to  determine  the  con¬ 
sistency  of  constraints  or  the  truth  of  a  theorem.  As 
already  noted,  this  does  not  necessarily  signify  the  falsity 
of  the  assertion,  just  that  the  verification  attempt  ended 
inconclusively.  Another  important  risk  is  that  the  verifi¬ 
cation  may  be  successful  but  rely  implicitly  upon  false 
assumptions  about  the  semantics  of  language  constructs.  As 
an  example  of  this,  we  saw  that  symbolic  executors  generally 
make  incorrect  simplifying  assumptions  about  the  functioning 
of  floating  point  hardware.  As  a  result  even  a  complete 
formal  verification  of  program  correctness  may  not  com¬ 
pletely  rule  out  the  possibility  of  an  execution-time  error. 
Hence  it  seems  that  both  testing  and  verification  should  be 
considered  techniques  for  raising  the  confidence  of  project 
personnel  in  the  software  product.  Each  is  capable  of  bol¬ 
stering  confidence  in  its  own  way,  and  neither  should  be 
employed  to  the  exclusion  of  the  other. 


VI.  THE  ARCHITECTURE  AND  DESIGN  OF  TOOLPACK.  We  now 
briefly  summarize  progress  to  date  (March  198J~)  in  designing 
a  specific  configuration  of  tools  to  meet  many  of  the  objec¬ 
tives  just  described.  This  tool  configuration,  named  TOOL- 
PACK[2]  has  as  its  objective  the  conveyance  of  strong 
comprehensive  tool  support  to  programmers  who  are  writing, 
testing,  transporting  or  analyzing  mathematical  software. 
Hence  it  must  provide  strong  support  for  documentation, 
testing,  and  verification,  as  well  as  such  code  creation 
activities  as  editing. 

It  was  decided  that  it  would  be  prudent  to  address  some 
specific  needs  of  this  well-established  community  as  a 
prelude  to  attempting  to  address  the  general  needs  of  a  more 
general  community,  because  of  the  lack  of  experience  in 
building  and  studying  such  such  large  configurations  of 
tools.  It  is  anticipated  that  experience  with  systems  such 
as  TOOLPACK  will  eventually  lead  to  the  establishment  of 
guidelines  for  production  of  other,  perhaps  more  general, 
tool  configurations . 

The  following  summary  is  extracted  from  [Oste  81  j, 
wherein  additional  details  can  be  found. 

Before  commencing  with  description  of  the  design,  it  is 
important  to  enunciate  the  following  basic  assumptions: 

1.  The  mathematical  software  whose  writing,  testing, 
and  analysis  is  to  be  supported  by  TOOLPACK  is  to  be  written 
in  a  dialect  of  Fortran  77,  which  shall  be  carefully  chosen 
to  span  the  needs  of  as  broad  and  numerous  a  user  community 
as  is  practical . 

2.  TOOLPACK  is  to  be  designed  to  provide  cost  effec¬ 
tive  support  for  the  production  by  up  to  3  programmers  of 
programs  whose  length  is  up  to  5000  lines  of  source  text. 
TOOLPACK  may  be  less  effective  in  supporting  larger  pro¬ 
jects  . 

3.  TOOLPACK  is  to  be  designed  to  provide  cost  effec¬ 
tive  support  for  the  analysis  and  transporting  of  programs 


£ 2 ] TOOLPACK  is  a  cooperative  project  involving  research¬ 
ers  at  Argonne  National  Laboratories,  Bell  Telephone  Labora¬ 
tories,  International  Mathematical  and  Statistical  Li¬ 
braries,  Inc.,  Jet  Propulsion  Laboratory,  Numerical  Algo¬ 
rithms  Group,  Ltd.,  Purdue  University,  University  of  Cali¬ 
fornia  at  Santa  Barbara  and  University  of  Colorado.  The 
project  is  being  funded  by  the  Dept,  of  Energy  and  the  Na¬ 
tional  Science  Foundation,  as  well  as  the  participating  in¬ 
stitutions  . 
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whose  length  is  up  to  10,000  lines  of  source  text.  TOOLPACK 
may  be  less  effective  in  supporting  larger  projects. 

4.  TOOLPACK  will  support  users  working  in  either  batch 
or  interactive  mode,  but  may  offer  stronger  more  flexible 
support  to  interactive  users. 

A.  Overview 

A  primary  motivating  goal  of  the  design  proposed  here 
is  that  user  support  be  supplied  in  as  direct  and  painless  a 
fashion  as  is  feasible.  In  particular,  the  design  attempts 
to  relieve  the  user  of  having  to  understand  the  natures  and 
idiosyncrasies  of  individual  TOOLPACK  tools.  It  also 
relieves  the  user  of  the  burden  of  having  to  combine  or 
coordinate  these  tools.  Instead  the  design  encourages  the 
user  to  express  his  needs  in  terms  of  the  requirements  of 
his  own  software  job.  The  TOOLPACK  support  system  is 
designed  to  then  ascertain  which  tools  are  necessary,  prop¬ 
erly  configure  those  tools,  and  present  the  results  of  using 
the  tools  to  the  user  in  a  convenient  form. 

The  design  encourages  the  user  to  think  of  TOOLPACK  as 
an  energetic,  reasonably  bright  assistant,  capable  of 
answering  questions,  performing  menial  but  onerous  tasks  and 
storing  and  retrieving  important  bodies  of  data.  The  aim  of 
this  is  to  make  humans  more  effective  in  creating,  document¬ 
ing,  testing  and  verifying  program  code. 

In  order  to  reach  this  view,  the  user  should  think  of 
TOOLPACK  as  a  vehicle  for  establishing  and  maintaining  a 
file  system  containing  all  information  important  to  the 
user,  and  using  that  file  system  to  both  furnish  input  to 
needed  tools  and  capture  the  output  of  those  tools. 
Clearly,  such  a  file  system  is  potentially  quite  large  and 
is  to  contain  a  diversity  of  stored  entities.  Source  code 
modules  would  certainly  reside  in  the  file  system,  but  so 
would  such  more  arcane  entities  as  token  lists,  and  flow- 
graph  annotations.  In  order  to  keep  TOOLPACK1 s  user  image 
as  straightforward  as  possible  this  design  proposes  that 
most  file  system  management  be  done  automatically  and  inter¬ 
nally  to  the  TOOLPACK  system,  out  of  the  sight  and  sphere  of 
responsibility  of  the  user.  The  user,  in  addition  is  to  be 
encouraged  to  have  access  to  only  a  relatively  small  number 
of  files  -  only  those  such  as  source  code  modules  and  test 
data  sets  which  are  of  direct  concern  to  him.  The  user  may 
create,  delete,  alter  and  rename  these  entities.  More 
important,  however,  the  user  may  manipulate  these  entities 
with  a  set  of  commands  which  selectively  and  automatically 
configure  and  actuate  the  TOOLPACK  tool  ensemble.  The  com¬ 
mands  are  designed  to  be  easy  to  understand  and  use.  They 
borrow  heavily  on  the  terminology  used  by  a  programmer  in 
creating  and  testing  code,  and  conceal  the  sometimes 
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considerable  tool  mechanisms  needed  to  effect  the  results 
desired  by  the  user. 

B.  User  Visible  File  System  Entities 

In  order  to  encourage  and  facilitate  the  preceding  view 
of  TOOLPACK,  the  system  will  support  the  naming,  storage, 
retrieval,  editing  and  manipulation  of  the  following  classes 
of  entities,  which  should  be  considered  to  be  the  basic 
objects  of  TOOLPACK: 

]. .  Program  units  : 

A  TOOLPACK  program  unit  (PU)  is  the  same  as  a  Fortran 
program  unit,  except  that  TOOLPACK  will  require  a  number  of 
representations  of  the  program  unit  other  than  the  source 
code  (e.g.,  the  corresponding  token  list  and  parse  tree)  . 
The  identity,  significance,  and  utilization  of  these  other 
representations  are  to  be  made  transparent  to  the  casual 
user.  They  will  be  managed  automatically  by  the  TOOLPACK 
system.  On  the  other  hand,  they  will  be  accessible  and 
usable  by  more  expert  users  through  published  standard  nam¬ 
ing  conventions  and  accessing  functions. 

2.  Execution  Units: 

Any  set  of  TOOLPACK  program  units  which  the  user 
chooses  to  designate,  can  be  grouped  into  a  TOOLPACK  execu¬ 
tion  unit  (EU).  Other  execution  units  may  also  be  named  as 
constituents  of  an  execution  unit,  as  long  as  no  circularity 
is  implied  by  such  definitions.  Ordinarily  it  is  expected 
that  an  execution  unit  will  be  a  body  of  code  which  is  to  be 
tested  as  part  of  the  incremental  construction  process. 
Hence  an  execution  unit  might  be  a  set  of  newly  coded  pro¬ 
gram  units  and  a  test  harness.  It  is,  however,  not  unrea¬ 
sonable  (and  indeed  potentially  quite  useful)  to  consider  a 
subprogram  library  to  be  an  execution  unit  as  well.  Here, 
too,  a  TOOLPACK  execution  unit  will  consist  of  more  than 
just  source  text,  but  the  user  will  not  need  to  be  aware  of 
the  existence  of  any  such  additional  entities. 

An  execution  unit  may  also  include  optionally  specifi¬ 
able  transformation  specifications  in  order  to  enable  users 
t.o  painlessly  apply  canonical  transformations  to  their  code. 
This  will  facilitate  such  functions  as  porting  of  code  and 
coding  in  higher  level  pseudolanguages  and  languages  such  as 
RATFOR  [Kern  75]  and  EFL  [Feld  78].  The  preferred  syntax 
for  specifying  the  attachment  of  such  transformation  specif¬ 
ications  to  an  EU  has  not  yet  been  decided  upon.  It  does 
seem  clear,  however,  that  this  specification  would  be 
straightforward  to  accomplish  if  the  TOOLPACK  command 
language  were  functional  in  structure. 
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3.  Test  Data  Collections: 

A  TOOLPACK  test  data  collection  (TDC)  is  a  collection 
of  test  data  sets  to  be  used  in  exercising  one  or  more  TOOL- 
PACK  execution  units.  A  test  data  collection  may  consist  of 
one  or  more  sets  of  the  complete  input  data  needed  to  drive 
the  execution  of  some  EU.  Each  test  input  data  set  may  also 
have  associated  with  it  a  specification  of  the  output  which 
is  expected  in  response  to  processing  of  the  specified 
input . 

4-  Options  Packets: 

A  TOOLPACK  options  packet  (OP)  is  a  set  of  directives 
specifying  which  of  the  many  anticipated  options  are  to  be 
in  force  for  a  particular  invocation  of  one  of  the  TOOLPACK 
tools.  We  see,  for  example,  the  need  for  Test  Option  Pack¬ 
ets  (TOP's)  to  specify  dynamic  testing  probe  insertion 
options . 

The  reason  for  defining  these  four  entities  as  being 
basic  to  TOOLPACK  is  that  they  seem  to  facilitate  the  key 
processes  of  creating,  transporting,  documenting,  testing, 
and  verifying  program  code  by  giving  the  user  considerable 
power  and  very  broad  flexibility  in  specifying  how  these 
activities  are  to  be  done.  This  design  is  intended  to  make 
it  straightforward  for  the  user  to  manipulate  programs  and 
to  designate  any  body  of  code  as  the  object  of  documentation 
testing  and  verification;  and  to  make  it  easy  for  the  user 
to  select  various  degrees  of  rigor  and  thoroughness  in 
analyzing  and  testing  that  code  by  exercising  it  with  test 
data  sets  selected  from  the  file  system.  This  can  perhaps 
best  be  seen  by  introducing  the  TOOLPACK  command  set  and 
indicating  how  it  is  to  be  used  to  manipulate  these  named 
data  entities. 

C.  The  TOOLPACK  Command  Language 

As  indicated  earlier,  the  exact  syntax  for  the  TOOLPACK 
command  language  has  not  been  established  and  is  still  under 
study.  A  decision  on  a  specific  syntax  will  be  made  in  the 
near  future,  and  is  likely  to  reflect  our  current  predispo¬ 
sition  towards  functional  notation. 

Currently  we  are  in  a  position  to  specify  much  of  the 
semantic  content  of  this  language.  In  the  following  sec¬ 
tions  we  name  generically  and  characterize  generally  the 
major  primitive  functional  capabilities  currently  antici¬ 
pated  . 

The  proposed  TOOLPACK  functional  primitive  set  seems  to 
divide  logically  into  four  parts:  file  system  management 
primitives,  edit  (synthesis)  primitives,  tool  application 
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(analysis)  primitives,  and  perusal  primitives.  In  the  fol¬ 
lowing  subsections,  the  needed  primitives  will  be  discussed 
individually.  Specific  names  and  syntax  are  attached  where 
necessary  only  as  an  aid  to  the  discussions,  rather  than  as 
a  concrete  proposal . 

1.  Data  Base  Manipulation  Primitives 

a .  NEW  entity 

Invocation  of  this  primitive  results  in  the  creation 
within  the  TOOLPACK  file  system  of  a  specific  entity,  named 
as  an  arqument  to  the  primitive,  which  is  either  a  PU,  EU, 
TDC ,  or  OP,  as  specified  by  the  user.  It  is  proposed  that  a 
TOOLPACK  entity  name  be  qualifiable  by  a  structured  qualifi¬ 
cation  scheme  facilitating  the  process  of  keeping  backup 
versions,  formatted  versions,  and  transformed  versions  of 
code,  and  variously  instrumented  versions  of  EU's. 

In  addition,  to  simplify  creating  the  named  entity 
within  the  TOOLPACK  data  base,  it  is  proposed  that  the  NEW 
invocation  also  automatically  invoke  a  special  purpose 
tutorial /editor  to  assist  the  user  in  creating  the  desired 
entity.  This  is  currently  under  discussion  and  not  a  firm 
part  of  the  design.  In  the  case  of  a  NEW  PU,  that  would  be 
a  text  editor  or  Fortran  intelligent  editor.  Special  pur¬ 
pose  editors  might  also  be  built  to  support  the  creation  of 
NEW  EU's  OP’s  and  TDC ' s  as  well. 

b.  OLD  entity 

Invocation  of  this  primitive  will  cause  the  retrieval 
of  the  named  entity  and  invocation  of  the  editor  appropriate 
for  the  type  of  the  named  entity. 

c.  DELETE  entity 

Invocation  of  this  primitive  results  in  the  named 
entity  being  marked  for  deletion  by  the  TOOLPACK  file  sys¬ 
tem  . 


d .  REPLACE 

Invocation  of  this  primitive  results  in  the  entity 
currently  being  edited  being  stored  back  in  the  TOOLPACK 
file  system.  Ordinarily,  the  edited  source  image  supplants 
the  unedited  source  image  and  any  currently  stored  images 
which  have  been  derived  from  the  source  are  automatically 
deleted  by  the  TOOLPACK  file  management  system.  If,  how¬ 
ever,  a  new  entity  name  is  specified  as  an  argument  to  this 
function,  then  the  currently  stored  entity  is  left 
untouched,  and  a  new  entity  is  created  and  initialized  to 
consist  of  the  newly  edited  source  text. 


577 


RENAME 


e . 


Invocation  of  this  primitive  simply  changes  the  name  of 
an  entity. 

2.  Edit  (synthesis)  primitives 

EDIT  entity 

Invocation  of  this  functional  primitive  would  have 
basically  the  same  effect  as  that  of  the  OLD  primitive.  The 
appropriate  editing  capability  would  be  summoned,  and  the 
named  entity  would  be  retrieved  from  the  file  system  and 
readied  for  manipulation. 

It  is  important  to  point  out  that  this  latter  operation 
is  expected  to  require  a  considerable  amount  of  care  and 
sophistication  if  it  is  to  be  done  effectively  in  the  gen- 
eral  case.  This  is  because  of  the  TOOLPACK  philosophy  of 
considering  some  of  the  file  system  entiities  to  be  accessed 
and  manipulated  individually.  As  a  result  it  is  conceivable 
that  a  user  might  access  and  alter  a  PU's  parse  tree,  but 
never  access  the  source  text  (or  vice  versa) .  In  such  a 
case  it  would  be  necessary  for  the  TOOLPACK  file  management 
capability  to  be  aware  of  the  fact  that  inconsistency  had 
been  introduced  between  these  two  PU  versions.  Although 
this  inconsistency  may  be  tolerable  for  a  while,  a  strategy 
for  recognizing  how  and  when  to  remedy  it  will  have  to  be 
evolved.  For  example,  the  PU's  source  text  might  not  need 
to  be  altered  to  become  consistent  with  an  altered  parse 
tree  until  and  unless  the  user  were  to  attempt  to  access  the 
source  text  (e.g.,  through  OLD  or  EDIT). 

More  complicated  situations  may  arise  in  carrying  out 
operations  on  EU's.  It  is  expected  that  users  will  be  able 
to  edit  the  source  text  of  an  entire  EU  by  being  given 
access  to  the  source  text  of  each  of  its  comprising  PU's.  A 
strategy  must  be  evolved,  however,  for  dealing  with  the 
impacts  of  changes  thereby  made  to  the  source  texts  of  PU's 
incorporated  into  several  different  EU's.  Likewise,  a  stra¬ 
tegy  is  needed  for  correctly  and  efficiently  effecting  the 
execution  of  an  EU  which  has  previously  been  executed,  but 
which  contains  some  constituent  PU's  which  have  been  EDIT'ed 
in  the  interim. 

In  solving  such  problems  we  intend  to  be  guided  to  good 
solutions  by  the  strategy  successfully  employed  in  the  MAKE 
capability  [Feld  79]. 

It  is  important  to  note  that  all  of  these  sorts  of 
problems  could  be  addressed  expediently,  but  inefficiently, 
by  adopting  a  strategy  of  retaining  little  in  the  file  sys¬ 
tem  and  purging  entities  from  it  upon  any  suspicion  that 
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they  might  become  inconsistent.  This  strategy  could  be 
adopted  for  early  releases  of  TOOLPACK  while  more  efficient 
strategies  are  evolved  and  tested. 

3.  Tool  Invocation  (Analysis)  Primitives 

These  primitives  invoke  the  functions  which  are  at  the 
heart  of  the  reason  for  the  TOOLPACK  project  -  namely  the 
facilitation  of  documentation,  testing  and  verification. 
Consequently,  great  pains  are  being  taken  to  make  them  easy 
to  understand  and  use.  In  an  important  sense,  the  rest  of 
the  TOOLPACK  primitive  set  has  been  designed  so  as  to  make 
these  tool  invocation  primitives  straightforward. 

a.  FORMAT  entity  [option  packet] 

Invocation  of  this  primitive  causes  a  named  program 
unit  to  be  taken  as  input  to  the  TOOLPACK  formatting  tool . 
The  resulting  output  text  will  supplant  the  original  source 
text,  and  any  derived  images  of  the  original  source  text 
will  be  deleted,  unless  a  new  entity  name  is  also  specified. 
In  case  a  new  entity  name  is  supplied,  the  output  of  the 
formatter  will  be  named  with  this  new  name  and  stored  in  the 
TOOLPACK  file  system  as  source  text.  It  is  expected  that 
option  packets  will  be  specifiable  in  order  to  facilitate 
user  selection  from  among  many  formatting  options . 

b.  STRUCTURE  entity  [option  packet] 

Invocation  of  this  primitive  has  the  same  effect  as 
invocation  of  the  FORMAT  command,  except  that  the  TOOLPACK 
structurer  is  invoked  instead  of  the  formatter . 

c.  ANALYZE  entity  [option  packet] 

Invocation  of  this  primitive  results  in  the  static 
analysis  of  the  entity  named.  If  the  entity  is  a  program 
unit,  then  single  unit  analysis  will  be  performed.  If  the 
entity  is  an  execution  unit,  then  each  program  unit  will  be 
analyzed  individually  and  integration  analysis  will  also  be 
performed . 

An  options  packet  may  be  specified  by  the  user.  This 
packet  will  enable  the  user  to  specify  a  level  of  thorough¬ 
ness  which  will  cause  analysis  to  go  as  far  as  the  lexical 
level,  the  syntactic  level,  the  static  semantic  level  or  the 
data  flow  level.  If  this  specification  is  omitted,  the 
TOOLPACK  system  will  select  a  default  option  (probably  full 
data  flow  analysis) . 

The  results  of  this  analysis  will  be  placed  into  an 
entity-attribute-relational  data  base  which  will  then  be 
available  for  perusal  by  a  browsing  subsystem  to  be 
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described  subsequently,  or  for  use  as  the  basis  for  report 
generation  tools  whose  goal  would  be  the  creation  of  supe¬ 
rior  documentation  * 

It  should  be  clear  that  invocation  of  the  ANALYZE  com¬ 
mand  will  effect  the  marshalling  and  configuration  of  a  con¬ 
siderable  assortment  of  tools  and  tool  fragments.  In  addi¬ 
tion,  the  stronger  forms  of  analysis  will  necessitate  the 
use  of  a  number  of  intermediate  images  of  the  source  text 
(e.g.,  parse  tree,  flowgraph,  callgraph) .  As  stated  ear¬ 
lier,  an  important  design  criterion  was  that  these  maneuver- 
ings  and  the  materialization  of  these  intermediate  images  be 
concealed  from  the  user  and  made  the  responsibility  of  the 
TOOLPACK  system. 


d.  EXECUTE  TEST  EUname,  TDCname,  OPname 

Invocation  of  this  primitive  results  in  the  dynamic 
test  execution  of  a  collection  of  test  data  sets  by  a  speci¬ 
fied  execution  unit.  The  test  data  sets  comprising  the  test 
data  collection  "TDCname"  are  fed  into  the  execution  module 
derived  from  the  execution  unit  "EUname"  one  at  a  time,  with 
the  results  of  each  execution  being  used  to  build  an  execu¬ 
tion  history  data  base.  This  data  base  also  would  be  used 
to  supply  answers  to  user-posed  questions  as  well  as  reports 
needed  for  documentation  purposes. 

The  user  may  optionally  specify  a  test  options  packet 
whose  purpose  is  to  select  and  specify  which  of  the  numerous 
execution  monitoring  options  are  to  be  employed  during  the 
test  runs.  The  power  and  flexibility  of  the  dynamic  test 
monitoring  system  is  to  be  considerable  (see  [Feib  81]). 
This  is  deemed  to  be  necessary,  but  is  also  considered  to  be 
a  serious  problem,  in  that  a  casual  or  novice  user  may  be 
intimidated  by  the  variety  of  available  choices .  Hence  it 
is  proposed  that  a  set  of  standard  Test  Option  Packets 
(TOP’s)  be  prepared  by  the  builders  of  the  dynamic  test  mon¬ 
itoring  system  and  stored  permanently  in  the  TOOLPACK  file 
system.  Users  could  select  from  among  these,  tailor  them  to 
individual  needs  by  using  the  TOP  editor,  or  create  their 
own  Top's  from  scratch.  One  of  the  standard  TOP’s  would  be 
configured  to  be  the  default  TOP,  enabling  the  user  to  do 
useful  dynamic  testing  without  needing  to  specify  any  TOP. 

Hie  actions  occurring  in  response  to  an  EXECUTE  TEST 
invocation  are  expected  to  be  extensive  and  complex.  Here 
too,  every  effort  has  been  made  to  conceal  this  complexity 
from  the  user  while  still  offering  considerable  flexibility 
to  construct  testing  situations  and  have  them  carried  out 
with  minimal  expense  through  extensive  reuse  of  intermediate 
data  objects  and  entities . 

D.  Perusal  Primitives 
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TOOLPACK  will  ultimately  contain  tools  to  facilitate 
the  examination  of  the  various  entities  in  the  TOOLPACK  file 
system.  This  document  has  already  described  various  special 
purpose  editors,  part  of  whose  purpose  will  be  to  facilitate 
examination  of  the  user-named  file  system  entities  (e.g., 
the  PU  source  text,  EU's,  OP's  and  TDC's). 

A  different  sort  of  tool  is  desirable  for  use  in  perus¬ 
ing  the  output  of  the  static  analysis  and  dynamic  testing 
tools.  As  already  noted,  these  tools  will  produce  as  output 
sets  of  analytic  and  diagnostic  packets  which  could  profit¬ 
ably  be  viewed  as  relational  data  bases.  Tools  for  effec¬ 
tively  browsing  these  data  bases  could  be  specifically  con¬ 
structed  to  efficiently  scan  these  data  bases  for  answers  to 
expected  queries.  Existing  text  editors  will  probably  serve 
as  primitive  forerunners  of  these  tools  in  early  releases  of 
TOOLPACK. 

Although  it  is  probable  that  there  will  eventually  be 
different  browsers  for  browsing  the  static  analysis  and 
dynamic  testing  data  bases,  it  is  expected  that  they  will 
both  be  invoked  by  the  same  command : 

BROWSF,[  database  name] 

The  databasename  is  one  which  will  be  automatically 
created  by  the  TOOLPACK  system  by  a  str aightforward  naming 
algorithm.  For  example,  the  data  base  produced  by  test  run 
#  n  of  TDC  t  applying  TOP  p  to  EtJ  e  would  perhaps  be  named 
e/p/t/nDB.  After  each  test  run  the  user  would  be  supplied 
this  name  and  the  size  of  the  data  base  itself  and  offered 
the  opportunity  to  SAVE  the  data  base.  SAVE ' d  data  bases 
would  then  be  available  for  subsequent  BROWSE' ing. 

The  data  base  name  would  be  optional  in  the  BROWSE  com¬ 
mand.  When  omitted  the  data  base  last  generated  would  be 
assumed . 

The  BROWSE  command  processor  would  determine  from  the 
data  base  name  the  type  of  data  base  to  be  BROWSE 1 d  (static 
or  dynamic)  and  invoke  the  necessary  browsing  tool. 


E.  An  illustration  of  how  the  TOOLPACK  architecture  might 
be  used  to  support  the  process  of  constructing  a  program 


The  following  diagram  is  inserted  here  in  an  attempt  to 
demonstrate  that  the  TOOLPACK  system  architecture,  as 
presented  here,  is  capable  of  satisfying  the  requirements  to 
which  the  TOOLPACK  group  has  addressed  its  efforts.  Thus 
the  diagram  is  intended  to  show  that  individuals  attempting 
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to  perform  code  creation,  as  outlined  earlier  in  this  docu¬ 
ment,  can  be  significantly  aided  and  supported  by  the  TOOL- 
PACK  system  architecture  as  described.  The  diagram  depicts 
what  is  thought  to  be  a  reasonable  procedure  for  code  crea¬ 
tion.  Hence  the  fact  that  it  seems  to  be  strongly  supported 
by  the  proposed  architecture  is  taken  to  be  encouraging. 

It  is  not  claimed  here  that  this  activity  diagram  is  a 
paradigm  of  "proper  procedure."  Hence  readers  who  perceive 
or  pursue  this  task  in  a  different  way  should  not  feel  that 
TOOLPACK  disapproves  of  them  or  will  not  support  them. 
Rather,  such  readers  are  strongly  encouraged  to  determine 
whether  the  TOOLPACK  system  will  be  useful  to  them  in  sup¬ 
porting  their  activities . 

In  particular  it  should  be  noted  that  no  symbolic  exe¬ 
cution  or  formal  verification  capabilities  are  currently 
proposed  for  inclusion  in  TOOLPACK,  nor  are  they  included  in 
the  activity  diagram.  This  reflects  the  perception  that 
mathematical  software  writers  currently  go  about  their  work 
without  these  capabilities.  As  discussed  earlier,  these 
capabilities  are  regarded  as  being  of  great  potential  value 
and  importance.  Hence  it  is  expected  that  they  may  be 
included  in  future  releases  of  TOOLPACK. 

Comments  and  Elaboration  on  Major  Activities 

1.  Create  regimen  of  test  cases  and  required  outcomes: 

An  editor  is  used  to  create  these  TDC ' s ;  results  are 
stored  in  the  data  base  for  subsequent  use,  each  TDC  is 
indexed  by  a  name  supplied  by  the  user. 

2.  Compose  new  source  text: 

A  text  editor  is  used  for  source  code  creation.  The 
editor  may  be  language  dumb,  or  may  incorporate  various 
types  of  language  awareness  -  e.g.,  may  parse  input,  accept¬ 
ing  only  syntactically  correct  source,  and  outputting  parse 
tree  and/or  token  list;  may  automatically  do  some  polishing 
as  well.  The  output  of  this  process  may  be  PU  source  text, 
token  list,  parse  tree  or  some  combination  of  the  three. 
Whatever  the  output,  it  is  to  be  stored  in  the  central  file 
system  indexed  by  PU  name  and  version  id.,  perhaps  supplied 
by  the  user.  The  user  may  also  define  EU's  as  sets  of  PU 
versions  and  transformation  specifications  and  assign  these 
EU's  names,  thereby  creating  other  file  system  entities. 

3.  Polish  and/or  structure  text: 

The  user  may  at  this  point  wish  to  polish  and  or  struc¬ 
ture  source  text  created.  There  is  to  be  an  automatic  purge 
of  unpolished  version  of  the  PU  from  file  system,  unless  the 
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user  directs  that  the  polished  version  be  saved  under  new 
version  name. 

4.  Perform  static  analysis: 

The  user  requests  "ANALYZE"  and  specifies  a  level  of 
thoroughness  for  analysis  and  an  EtJ  (by  name).  New  EU's  may 
be  defined  at  this  point.  Single  unit  and  integration 
analysis  is  done  -  lexical,  syntactic,  static  semantic  and 
data  flow  -  at  user  option.  A  data  base  of  analytic  results 
is  created  for  browsing  by  means  of  the  BROWSE  subsystem. 

5.  Set  up  test  runs: 

The  user  creates  top’s,  specifying  types  and  thorough¬ 
ness  of  dynamic  monitoring.  He  may  modify  or  create  new 
TDC ' s  here  as  well.  This  is  basically  an  editing  activity. 
The  user  must  create  new  TDC 1 s  or  access  TDC 1 s  created  in 
activity  1?  an  interactive  editor  would  be  useful  here?  a 
source  text  editor  may  be  used  to  inject  new  assertions  in 
the  source  text;  a  TOP  editor  may  be  used  to  build  and 
modify  various  TOP’s. 

6.  Run  dynamic  test(s): 

The  user  specifies  a  sequence  of  test  runs  as  a 
sequence  of  triples  (EU, TOP, TDC)  of  named  database  entities; 
test  runs  are  made  and  results  go  into  relational  data  bases 
for  perusal  by  the  BROWSE  processor. 

This  involves  automatic  instrumentation,  compilation, 
link  editing  (including  fetching  of  run-time  libraries  to 
support  monitoring)  creating  data  bases  of  results,  creating 
and  presenting  to  user  of  requested  results. 

7.  Browse  source  text  and  test  execution  results: 

This  involves  use  of  a  query  system  and  information 
management  system  to  help  the  user  identify  and  understand 
errors  well  enough  to  fix  them. 
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VII  .  OVERVIEW  OF  TOOLPACK  APPROACH- 
The  preceding  section  was  devoted  to  a  presentation  of  a 
user's  view  of  the  TOOLPACK  system-  The  purpose  of  this 
section  is  to  suggest  an  interior,  or  implementor's,  view  of 
the  TOOLPACK  system-  This  section  is  not  purported  to  be  a 
complete  design  specification*  It  is  offered,  rather,  in 
support  of  the  contention  that  the  TOOLPACK  system,  as 
presented,  is  eminently  implemen table  with  existing 
knowledge  and  technology.  Hence  the  reader  should  feel  com¬ 
fortable  in  considering  the  merits  of  the  proposed  set  of 
capabilities  freed,  to  some  extent,  of  worries  about  its 
real izability . 

A-  The  File  System 

Clearly  the  primary  feature  of  the  proposed  TOOLPACK 
system  is  the  central  file  system  of  information  about  the 
subject  program*  The  user  is  encouraged  to  think  and  plan 
his  work  in  terms  of  it,  and  the  functional  tools  all  draw 
their  input  from  it  and  place  their  output  into  it. 

This  file  system  is  to  be  initialized  with  the  start  of 
a  project  and  remain  and  grow  throughout  the  lifetime  of  the 
project.  There  is  no  reason  why  2-3  users  may  not  all 
access  this  file  system  although  we  will  make  the  implicit 
assumption  that  it  is  accessed  by  one  user  at  a  time  in  a 
non  destructive  way. 

The  TOOLPACK  system  itself  will  manage  the  file  system 
primarily  by  means  of  a  tree  structured  directory  system  and 
a  modular  set  of  file  accessing  and  updating  primitives. 
TOOLPACK  files  will  not  correspond  directly  to  host  machine 
files,  but  will  rather  be  mapped  onto  segments  of  one  or 
more  large  host  system  files-  The  TOOLPACK  file  accessing 
and  updating  capabilities  will  effect  this  segmentation  and 
operate  directly  upon  these  large  host  system  files.  The 
object  of  this  approach  is  to  reduce  the  overhead  of  dealing 
directly  and  depending  too  heavily  upon  host  file  systems* 
An  implementation  of  such  a  set  of  l/O  capabilities  (called 
PIOS)  has  been  written  in  portable  Fortran  [Hans  80a]. 
Experience  with  this  system  has  shown  that  this  approach  can 
be  pursued  without  unacceptable  loss  of  speed  and  effi¬ 
ciency.  Thus,  this  system  is  expected  to  be  used  at  least 
as  a  guide  to  an  effective  modularization  of  capabilities, 
and  will  perhaps,  be  incorporated  into  TOOLPACK  in  to to. 

A  tree  structured  file  directory  system  (PDS)  has  also 
been  written  in  portable  Fortran  [Hans  80bl .  This  is  also 
quite  appealing  as  at  least  a  model  of  effective  functional¬ 
ity  and  modularization,  and  perhaps  as  a  body  of  code  to  be 
directly  incorporated  into  TOOLPACK.  It  offers  the  added 
feature  of  being  designed  for  ready  interfacing  with  PIOS, 
thereby  forming  a  portable  file  directory  and  accessing 


584 


mechanism.  This  tandem  will  go  far  towards  implementing  the 
TOOLPACK  file  system. 

B .  The  Virtual  Aspect  of  the  File  System  and  the  Retention/ 
Replacement  Module 

A  stated  design  objective  for  TOOLPACK  is  that  it  run 
effectively  on  a  wide  range  of  machines  effectively  utiliz¬ 
ing  larger  amounts  of  storage  when  and  if  they  can  be  made 
available.  One  way  in  which  large  amounts  of  storage  can  be 
effectively  utilized  is  to  store  all  derived  and  intermedi¬ 
ate  entities  for  possible  future  reuse .  Storage  economies 
can  be  gained  by  refusing  to  store  those  entities  and 
instead  regenerating  them  as  needed.  The  strategy  for 
retaining  or  regenerating  these  entities  must  be  adjustable 
and  transparent.  It  is  highly  desirable  that  both  the  end 
user  and  the  tool  ensemble  always  be  safe  in  assuming  that 
any  needed  named  entities  and  derived  images  will  always  be 
available.  Thus  it  is  necessary  that  the  TOOLPACK  file 
management  system  assume  the  responsibility  for  either 
retrieving  these  items  directly  or  having  them  created  or 
regenerated  ( in  case  storage  exigencies  precipitated  their 
deletion  by  TOOLPACK). 

Perhaps  the  workings  of  this  virtual  file  system  scheme 
can  best  be  understood  through  an  example.  Suppose  one  of 
the  functional  tools  (e.g.,  the  static  analyzer)  needed 
access  to  the  parse  tree  of  a  particular  version  of  a  par¬ 
ticular  PU,  let  us  call  it  SUBR/VER.  The  tool  would  request 
(and  subsequently  receive)  the  parse  tree  through  a  subrou¬ 
tine  call  such  as 

CALL  DBFTCH ( 'SUBR/VER/P' ,  ARRAY,  LEN) 

where  ARRAY  is  the  name  of  the  array  within  the  tool  which 
is  to  receive  the  parse  tree,  and  LEN  is  a  specification  of 
the  length  of  ARRAY,  included  in  this  invocation  to  guard 
against  inadvertent  array  overflow. 

Subroutine  DBFTCH  would  then  use  ’SUBR/VER/P’  as  an 
index  into  the  TOOLPACK  file  system  directory  in  order  to 
look  up  the  internal  designation  of  the  TOOLPACK  file  con¬ 
taining  the  parse  of  SUBR/VER.  Should  there  actually  be 
such  a  TOOLPACK  file,  there  would  also  be  a  length  specifier 
for  it.  The  length  specifier  would  be  used  to  determine 
whether  the  invoking  tool's  array  was  large  enough  to  hold 
the  parse.  If  so  DBFTCH  would  need  only  to  invoke  a  file 
I/O  primitive  to  read  the  indicated  TOOLPACK  file  into  the 
invoking  tool's  array. 

If  the  directory  contained  no  entry  for  the  parse  tree, 
DBFTCH  would  need  to  see  that  a  parse  was  created'.  Guidance 
for  this  process  would  come  from  an  internal  table 
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specifying  how  the  various  TOOLPACK  derived  images  are  to  be 
derived  from  each  other.  This  table  would  be  essentially  a 
directed  acyclic  graph  (DAG)  with  the  nodes  representing  the 
various  file  system  entity  types,  and  the  edges  representing 
processing  capabilities.  In  particular,  an  edge  would 
represent  the  processing  capability  needed  to  produce  the 
entity  at  its  head  from  the  entity  at  its  tail.  It  is  worth 
noting  that  the  production  of  some  entities  (e.g.,  an  anno¬ 
tated  flowgraph)  might  require  that  more  than  one  process 
acting  on  more  than  one  file  system  entity. 

In  any  case  DBFTCH  would,  from  this  table,  produce  an 
ordered  list  of  the  processes  and  entities  which  would  be 
needed  to  produce  the  requested  entity  by  traversing  the 
dependency  DAG.  DBFTCH  would  then  proceed  down  this  list 
looking  to  see  which  entities  are  already  stored  in  the  file 
system.  Using  this  information  DBFTCH  would  then  invoke  in 
the  correct  order  only  those  processes  needed  to  produce  the 
desired  entity. 

Returning  to  our  example,  DBFTCH  would  look  up  ' P'  in 
the  dependency  DAG,  which  would  then  show  that  a  parse  tree 
is  derived  from  a  token  list  by  a  parser  and  a  token  list  is 
derived  from  a  source  string  by  a  lexical  analyzer  (lexer). 
Thus  DBFTCH  would  next  check  for  the  existence  in  the  file 
system  of  the  token  list  for  SUBR/VEP.  If  it  is  present 
DBFTCH  will  invoke  the  parser  producing  the  required  parse 
tree.  If  the  token  list  is  absent,  DBFTCH  will  first  invoke 
the  lexical  analyzer  to  produce  a  token  list  from  the  source 
text,  and  then  invoke  the  parser.  If  the  source  text  should 
not  be  in  the  file  system,  an  error  message  would  be  passed 
on  to  the  user. 

This  virtual  file  system  scheme  could  be  stretched  even 
farther.  Although  it  is  currently  anticipated  that  the  file 
system  will,  hold  in  explicit  form  any  formattings  and  struc¬ 
turing  of  a  given  piece  of  source  text,  this  is  not  neces¬ 
sary.  Such  versions  could  be  recreated  by  the  file  manage¬ 
ment  system  only  when  needed  by  following  a  procedure  such 
as  just  outlined.  Even  whole  static  analysis  or  dynamic 
testing  data  bases  could  be  regenerated  in  this  way.  This 
gives  the  file  management  system  the  flexibility  to  purge 
large  files  to  regenerate  storage  while  still  retaining  the 
ability  to  recreate  these  files  when  necessary. 

This  feature  should  prove  particularly  useful  in  host- 
JLng  the  TOOLPACK  system  on  smaller  storage  machines.  Here 
it  may  be  necessary  to  permanently  store  only  source  text. 
Under  these  circumstances  all  derived  images  and  intermedi¬ 
ate  entities  will  be  routinely  purged,  requiring  that  they 
be  recreated  whenever  needed.  This  will  result  in  extra 
computati-OQ  time  to  meet,  the  user' s  request,  but  seems  a 
very  reasonable  trade  for  the  lack  of  storage. 
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In  order  for  this  scheme  to  work  best  the  strategy  for 
deciding  which  entities  to  delete  and  when  to  delete  them 
must  be  carefully  determined.  There  appears  to  be  little 
experience  in  devising  such  strategies.  Hence  we  must 
expect  that  a  lengthy  period  of  experimentation,  observation 
and  adjustment  will  be  necessary.  Thus  our 
replacement/ retention  strategy  will  be  encapsulated  in  a 
module  to  facilitate  such  experimentation  and  adjustment. 

C.  The  Command  Language  Interpreter 

As  noted  earlier,  the  TOOLPACK  command  language  syntax 
has  yet  to  be  decided  upon.  Nevertheless,  it  is  reasonable 
at  this  stage  of  design  to  sketch  the  architecture  of  the 
processor  which  must  effect  the  execution  of  TOOLPACK  com¬ 
mands  . 

This  processor  -  the  TOOLPACK  command  interpreter  - 
will  probably  consist  of  three  phases:  command  syntactic 
analysis,  command  decomposition  into  TOOLPACK  functions,  and 
sequential  TOOLPACK  function  invocation. 

Of  the  three,  the  first,  syntactic  analysis,  should  be 
the  most  straightforward .  Once  a  command  syntax  is  agreed 
to,  a  parser  generator  should  suffice  for  the  production  of 
a  parser  capable  of  rendering  commands  into  trees  of  command 
tokens.  Straightforward  though  this  may  appear,  it  seems 
important  to  isolate  syntax  analysis  in  a  separate  phase  in 
order  to  facilitate  change.  It  is  recognized  that  users' 
reactions  to  TOOLPACK  may  be  strongly  influenced  by  the  per¬ 
ceived  friendliness  and  ease  of  use  of  the  command  language 
itself.  It  thus  seems  important  to  enable  changes  in  the 
language  when  and  if  experience  indicates  they  are  desir¬ 
able.  This  will  clearly  be  facilitated  by  using  a  parser- 
generator-created  parser  as  the  first  phase  of  the  command 
interpreter. 

The  second  phase,  command  decomposition ,  will  be  more 
complex  entailing  1)  the  selection  of  the  standard  template 
of  TOOLPACK  functions  indicated  by  the  command  and  ?)  the 
elaboration  of  this  template  as  indicated  to  be  necessary  by 
option  selections,  file  system  status,  and  reporting  and 
contingency  handling  directives.  In  particular,  it  is 
expected  that  the  semantics  of  each  TOOLPACK  command  will  be 
defined  at  least  generally  by  a  standard  sequence  of  TOOL- 
PACK  functional  processing  steps  to  be  performed  by  indivi¬ 
dual  tool  fragments.  The  flow  of  data  structures  through 
these  fragments  will  be  effected  by  the  definition,  creation 
and  accessing  of  entities  within  the  file  system.  Thus,  the 
construction  of  specific  file  system  primitive  invocations 
will  also  be  the  responsibility  of  this  second  phase.  In 
view  of  the  preceding  discussions  of  the  virtual  file  system 
concept,  it  is  clear  that  the  regeneration  and  updating  of 
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file  system  entities  may  also  be  entailed  during  command 
elaboration . 

The  end  product  of  this  phase  is  expected  to  be  a 
sequential  file  of  TOOLPACK  directives  describing  in  detail 
all  steps  needed  to  be  carried  out  by  TOOLPACK  tool  frag¬ 
ments  in  order  to  effect  the  specified  command  in  the  exact 
context  of  the  current  state  of  the  TOOLPACK  file  system. 
As  such  this  phase  might  well  be  viewed  as  a  pseudocompila¬ 
tion  into  a  machine  independent  intermediate  code. 

The  final  phase  is  the  actual  interpretation  process. 
Here  tool  fragments  and  file  system  accessing  primitives  are 
invoked  in  the  indicated  sequential  order,  with  allowances 
being  made  for  alteration  of  sequencing  due  to  errors  or 
other  contingencies . 

D.  Major  TOOLPACK  Functional  Commands 

As  an  aid  to  understanding  how  major  TOOLPACK  func¬ 
tional  capabilities  are  to  be  fashioned  out  of  smaller  tool 
fragments,  we  now  include  diagrams  indicating  the  way  in 
which  we  propose  to  effect  the  implementation  of  two  major 
functional  commands. 

1.  ANALYZE  (Static  Analysis) 

Figure  8  is  a  diagram  showing  how  execution  of  the 
ANALYZE  command  will  be  effected  by  the  operation  of  TOOL- 
PACK  tool  fragments  on  file  system  entities.  Some  of  these 
entities  will  be  created  by  these  fragments,  but  others 
(such  as  the  token  list  and  symbol  table)  should  be  thought 
of  as  perhaps  having  been  created  by  the  execution  of  other 
TOOLPACK  commands. 

File  system  entities  are  identifiable  as  being  shown  in 
squared  boxes.  Tool  fragments  are  shown  in  circles  or 
ovals.  It  should  be  noted  that  most  of  the  indicated  tool 
fragments  have  been  built  in  at  least  prototype  form  as  part 
of  the  DAVE  project  at  the  University  of  Colorado. 

2.  EXECUTE  TEST  (Dynamic  Testing) 

Figure  9  shows  how  execution  of  the  EXECUTE  TEST  com¬ 
mand  will  be  implemented  by  tool  fragments  and  functional 
capabilities  (such  as  compilation  and  loading)  to  be  bor¬ 
rowed  from  the  host  operating  system  and  environment.  It 
has  been  suggested  that  the  testing  capability  as  proposed 
in  the  NEWTON  report  [Feib  81]  is  too  general  to  be  comfort¬ 
ably  thought  of  as  a  single  functional  capability.  Hence 
the  subdivision  of  NEWTON  into  smaller  tool  fragments  and 
the  introduction  of  several  more  sharply  named  and  focussed 
capabilities  is  being  studied. 
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3.  Symbolic  Execution 

No  symbolic  execution  capability  is  currently  being 
planned  for  inclusion  in  early  releases  of  TOOLPACK.  This 
decision  is  influenced  by  the  relatively  high  cost  of  creat¬ 
ing  this  capability  and  its  unfamiliarity  to  the  TOOLPACK 
target  user  community.  Both  of  these  considerations  are 
expected  to  change  with  time,  encouraging  the  eventual 
inclusion  of  symbolic  execution  within  TOOLPACK.  This  will 
be  facilitated  by  the  establishment  in  early  TOOLPACK 
releases  of  a  core  set  of  tool  fragments  upon  which  a  sym¬ 
bolic  execution  capability  can  be  built  at  a  later  date. 


VI 1 1 •  SUMMARY .  Considerable  experience  with  isolated 
individual  tools  has  led  us  to  believe  that  comprehensive 
collections  of  tools  are  possible  and  desirable.  Logical 
tool  integration  strategies  are  now  perceptible  and  are  also 
reasonable  as  objects  of  experimental  study.  The  TOOLPACK 
architecture  is  one  such  strategy.  A  system  of  this  sort  is 
under  design  and  will  be  built  and  studied. 
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PROCEDURE  AREAS; 


2  DECLARE  REAL  A(20,20,2),  INTEGER  PI,  P2,  P3 ; 

3  PROCEDURE  INIT  (H, B ) ; 

4  DECLARE  INTEGER  H,  B,  I,  J,  K,  REAL  XK ; 

5  IF  H  >  20  THEN  ERROR  STOP ; 

6  IF  B  >  20  THEN  ERROR  STOP; 


7 

DO  FOR  I  =  1  TO  H; 

8 

A(i,  1,  1)  =  I; 

9 

.  DO  FOR  J  =  2  TO  B; 

10 

A(I,  J,  1)  =  A( I ,  J-l, 

1) 

+  I  ; 

11 

END ; 

12 

END; 

13 

K  =  2; 

14 

XK  =  2.0; 

15 

DO  FOR  1=1  TO  H; 

16 

DO  FOR  J  =  1  TO  B; 

17 

A(I,  J,  K)  =  A( I ,  J,  K- 

-1) 

/  XK; 

18 

END; 

19 

END; 

20 

END; 

21 

PROCEDURE  LOOKUP  (I,  J,  K)  ; 

22 

DECLARE  INTEGER  I,  J,  K; 

23 

CASE; 

24, 

25 

K  =  1:  PRINT  "AREA  OF"  I, 
A(I,  J,'  K)'; 

J 

"RECTANGLE  IS" 

26, 

27 

K  =  2:  PRINT  "AREA  OF"  I, 
A( I ,  J,  K)V 

J 

"TRIANGLE  IS" 
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28,  29 


ELSE:  PRINT  "PARAMETER  ERROR:  K  =  "  K 

30  END; 

31  END; 

32  CALL  INIT  (20, 20) ; 

33  LOOP  FOREVER; 

34  READ  PI,  P2 ,  P3 ; 

35  IF  P3  =  0  THEN  STOP ; 

36  ELSE  CALL  LOOKUP  (PI,  P2 ,  P3 )  ; 

37  END; 

38  END; 

FIGURE  1:  An  example  program 
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1 


PROCEDURE  AREAS 


2 

DECLARE  REAL  A(20,20,2), 

INTEGER  PI 

,  P2 ,  P3; 

3 

PROCEDURE  I NIT  ( H , B) ; 

4 

DECLARE  INTEGER  H,  B, 

I,  J,  K,  REAL  XK; 

5 

IF  H  >  20  THEN  ERROR 

STOP; 

6 

IF  B  >  20  THEN  ERROR 

STOP  ; 

7 

DO  FOR  I  =  1  TO  H ; 

El 

IF  "“(l  <=  I  <=  20) 

THEN  SUBSCRIPT  RANGE 

ERROR; 

E2 

IF  ~(1  <-  1  <=  20) 

THEN  SUBSCRIPT  RANGE 

ERROR; 

E3 

IF  ~(1  <=  1  <=  2) 

THEN  SUBSCRIPT  RANGE  ERROR; 

3 

A ( I ,  1,  1)  =  I; 

9 

DO  FOR  J  =  2  TO  B; 

E4 

IF  ~(1  <=  I  <= 
ERROR; 

2  0 )  THEN 

SUBSCRIPT 

RANGE 

E5 

IF  ~(1  <=  J  <= 

ERROR ; 

2  0 )  THEN 

SUBSCRIPT 

RANGE 

E6 

IF  ~(1  <=  1  <= 

ERROR ; 

2)  THEN 

SUBSCRIPT 

RANGE 

E7 

IF  ~(1  <=  I  <= 
ERROR; 

2  0 )  THEN 

SUBSCRIPT 

RANGE 

E8 

IF  ~ (1  <=  J-l  < 

ERROR; 

=  20)  THEN 

SUBSCRIPT 

RANGE 

E9 

IF  “(1  <=  1  <= 

ERROR; 

2 )  THEN 

SUBSCRIPT 

RANGE 

10 

A( I »  J,  1)  =  A ( I ,  J-l,  1) 

+  I  ‘ 

11 

END  ; 

12 

END  ; 

13 

K  =  2  ; 

14 

XK  =  2.0; 
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15 

DO  FOR  I  =  1  TO  H; 

16 

DO  FOR  J  =  1  TO  B ; 

E10 

IF  ~(1  <=  I  <=  20) 

ERROR; 

THEN 

SUBSCRIPT 

RANGE 

Ell 

IF  “(1  <=  J  <=  20) 

ERROR; 

THEN 

SUBSCRIPT 

RANGE 

El  2 

IF  ~  (1  <=  K  <=  2) 

ERROR; 

THEN 

SUBSCRIPT 

RANGE 

E13 

IF  ~(1  <=  I  <=  20) 

ERROR; 

THEN 

SUBSCRIPT 

RANGE 

E14 

IF  “(1  <=  J  <=  20) 

ERROR; 

THEN 

SUBSCRIPT 

RANGE 

El  5 

IF  ~ (1  <=  K-l  <=  2) 
ERROR; 

THEN 

SUBSCRIPT 

RANGE 

El  6 

IF  XK  =  0  THEN  ZERODIVIDE 

ERROR ; 

17 

A(I,  J,  K)  =  A ( I ,  J, 

K-l) 

/  XK; 

18 

END  ; 

19 

END  ; 

20 

END  ; 

21 

PROCEDURE  LOOKUP  (I,  J,  K) 

l 

22 

DECLARE  INTEGER  I,  J,  K; 

23 

CASE; 

24 

K  =  1: 

E17 

IF  ~ (1  <=  I  <=  20) 

ERROR; 

THEN 

SUBSCRIPT 

RANGE 

E18 

IF  ~(1  <=  J  <=  20) 

ERROR; 

THEN 

SUBSCRIPT 

RANGE 

E19 

IF  ~(1  <=  K  <=  2) 

ERROR; 

THEN 

SUBSCRIPT 

RANGE 

25 

PRINT  "AREA  OF"  I,  J 
A(T,  J,  K); 

"RECTANGLE  IS" 
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26 


K 


2: 


E20 

IF  ~(1 
ERROR; 

H 

1! 

V 

<  = 

20) 

THEN 

SUBSCRIPT 

RANGE 

E21 

IF  ~  ( 1 
ERROR; 

<=  j 

<  = 

20) 

THEN 

SUBSCRIPT 

RANGE 

E22 

IF  ~(1 
ERROR; 

<=  K 

2) 

THEN 

SUBSCRIPT 

RANGE 

27 

PRINT 
A(  I,  J 

"AREA 

,  *kT7 

OFM 

I,  J 

"TRIANGLE  IS" 

28,29  ELSE:  PRINT  "PARAMETER  ERROR:  K  =  "  K; 

30  END; 

31  END; 

32  CALL  INIT  (20,20); 

33  LOOP  FOREVER; 

34  READ  PI,  P2 ,  p3 ; 

35  IF  P3  =  0  THEN  STOP ; 

36  ELSE  CALL  LOOKUP  (P1  P2,  P3 ) ; 

3  7  END ; 

38  END; 

FIGURE  2 

The  program  of  Figure  1,  with  probes  for  zero-divide 
and  subscript  range  errors  inserted.  The  probes 
shown  are  those  which  would  be  inserted  by  a  naive 
dynamic  test  tool  and  have  statement  numbers  pre- 
ceeded  by  the  letter  "E" . 


597 


1 


PROCEDURE  AREAS; 


2 

DECLARE  REAL  A(20,20,2),  INTEGER  PI, 

P2 ,  P3 

3 

PROCEDURE  INIT  (H,  B) ; 

A1 

ASSERT  NO  SIDE-EFFECTS 

4 

DECLARE  INTEGER  H,  B,  I,  J,  K,  REAL  XK; 

5 

IF  H  >  20  THEN  ERROR  STOP; 

6 

IF  B  >  20  THEN  ERROR  STOP; 

7 

DO  FOR  I  =  1  TO  H; 

8 

A(I,  1,  1)  =  I; 

9 

DO  FOR  J  =  2  TO  B; 

10 

A(  I ,  J,  1)  =  A( I ,  J-l,  1)  + 

I ; 

A2 

ASSERT  A( I ,  J,  1)  =  I*J; 

11 

END; 

12 

END; 

13 

K  =  2; 

14 

XK  =  2.0; 

15 

DO  FOR  I  »  1  TO  H; 

16 

DO  FOR  J  =  1  TO  B; 

17 

A(I,  J,  K)  =  A( I ,  J,  K-l )  / 

XK; 

A3 

ASSERT  A( I ,  J,  2)  =  0.5  *  A(I,  J, 

18 

END; 

19 

END ; 

20 

END ; 

21 

PROCEDURE  LOOKUP  (I,  J,  K)  ; 

A4 

ASSERT  NO  SIDE-EFFECTS; 

22 

DECLARE  INTEGER  I,  J,  K; 
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A5 

ASSERT  1  <=  I  <=  20; 

A6 

ASSERT  1  <=  J  <=  20; 

23 

CASE ; 

24,  25 

K  =  Is  PRINT  "AREA  OF" 
A(  I ,  J,  K)  ; 

I,  J 

"RECTANGLE 

26,  27 

K  =  2:  PRINT  "AREA  OF" 
A(  I ,  J,  K) ; 

I,  J  " 

TRIANGLE  IS" 

28,  29 

ELSE:  PRINT  "PARAMETER 

ERROR ; 

K  =  "  K; 

30 

END; 

31 

END; 

32 

CALL  INIT  (20,20); 

33 

LOOP  FOREVER; 

34 

READ  PI,  P2 ,  P3 ; 

35 

IF  P3  ~  0  THEN  STOP; 

36 

ELSE  CALL  LOOKUP  (PI, 

P2,  P3); 

37 

END; 

38 

END; 

FIGURE  3 


The  Program  of  Figure  1  as  it  might  be  augmented  by 
assertions  capturing  the  intent  of  the  code. 
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1 


PROCEDURE  AREAS; 


2 

3 

4 

PI,  1 
PI,  2 
PI,  3 

5 

6 

7 

8 

9 

10 

P2 ,  1 


DECLARE  REAL  A(20,20,2),  INTEGER  PI,  P2,  P3 ; 
PROCEDURE  INIT  (H,  B) ; 

DELCARE  INTEGER  H,  B,  I,  J,  K,  REAL  XK ; 
DECLARE  INTEGER  HTEMP,  BTEMP; 

HTEMP  =  H; 


BTEMP  =  B; 

IF  H  >  20  THEN  ERROR  STOP ; 

IF  B  >  2  0  THEN  ERROR  STOP  ; 

DO  FOR  I  =  1  TO  H; 

A(I,  1,  1)  =  I? 

DO  FOR  J  =  2  TO  B; 

A( I ,  J,  1)  =  A(I,  J-l,  1)  +  I? 

IF  A(I,  J,  1)  I  *  J  THEN  PRINT 

TION  VIOLATION  AFTER  STATEMENT  lO77'" 
A(I,  J,  l),  I,  *1# 


"ASSER- 


11 

END  ; 

12 

END; 

13 

K  - 

=  2; 

14 

XK 

c 

CN 

II 

15 

DO 

FOR  I  =  1  TO  H; 

16 

DO  FOR  J  =  1  TO  B; 

17 

A(I,  J,  K)  =  A( I ,  J,  K-l)  /XK; 

P3 , 1 

IF  A(I,  J,  2)  ~=  0.5  *  A( I ,  J, 

PRINT  "ASSERTION  VIOLATION  AFTER 
17"  A( I ,  J,  2),  I,  J; 

i)  then 

STATEMENT 


18  END; 

19  END; 
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PI,  4 

IF  H  ~  =  HTEMP  THEN  PRINT  "SIDE  EFFECTS 
FOR  H"  H,  HTEMP . 

VIOLATION 

PI,  5 

IF  B  ~  =  BTEMP  THEN  PRINT  "SIDE  EFFECTS 
FOR  B"  B,  BTEMP; 

VIOLATION 

20 

END; 

21 

PROCEDURE  LOOKUP  (I,  J,  K) ; 

22 

DECLARE  INTEGER  I,  J,  K: 

P4,  1 

DECLARE  INTEGER  ITEMP,  JTEMP,  KTEMP ; 

P4,  2 

ITEMP  =  I: 

P4,  3 

JTEMP  =  J; 

P4,  4 

KTEMP  =  K; 

P5 , 1 

IF  ~(1  <=  I  <=  20)  THEN  DRINT  "ASSERTION  VIOLA- 

TION  AFTER  STATEMENT  "22"  I ; 

P6, 1 

IF  ~(1  <=  J  <=  20)  THEN  PRINT  "ASSERTION  VIOLA¬ 
TION  AFTER  STATEMENT  22"  J;" 

23 

CASE; 

24,  25 

K  =  1:  PRINT  "AREA  OF"  I,  J  "RECTANGLE  IS" 
A( I ,  J,  K) 7  ' 

26,  27 

K  =  2:  PRINT  "AREA  OF"  I,  J  "TRIANGLE  IS" 

a(.i,  j,“k5T" 

28,  29 

ELSE:  PRINT  "PARAMETER  ERROR:  K  =  " 

K; 

30 

END; 

P4,  5 

IF  I  ~=  ITEMP  THEN  PRINT  "SIDE  EFFECTS 
FOR  1"  1,  ITEMP; 

VIOLATION 

P4, 6 

IF  J  “=  JTEMP  THEN  PRINT  "SIDE  EFFECTS 
FOR  J"  J,  JTEMP; 

VIOLATION 

P4,  7 

IF  K  “=  KTEMP  THEN  PRINT  "SIDE  EFFECTS 
FOR  K"  K,  KTEMP; 

VIOLATION 

31 

END  ; 

3  2 

CALL  INIT  (20,20); 

33 

LOOP  FOREVER; 
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34 


READ  PI,  P2 ,  P3 ; 


35 

IF  P3 

=  0  THEN 

STOP; 

36 

ELSE 

CALL  LOOKUP  (PI,  P? ,  P3 ) ; 

37 

END; 

38 

END  ; 

FIGURE  4 


The  Program  of  Figure  1  as  it  might  be  augmented  by 
probes  inserted  by  an  assertion  checking  tool  in 
response  to  the  assertions  shown  in  Figure  3.  The 
inserted  probes  are  denoted  by  line  numbers  begin¬ 
ning  with  p.  Line  number  PI,J  is  attached  to  the 
Jth  statement  generated  as  a  result  of  assertion  AI 
in  Figure  3. 
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The  flowgraphs  of  the  three  procedures  in  the  example  program  of  Fig.  1. 
The  nodes  are  numbered  by  the  statement  of  Figure  1.  For  each  node,  the 
program  variables  which  are  defined  there  and  referenced  there  are  listed. 
Note  that  node  36  represents  a  procedure  invocation  with  variables  as 
arguments.  Thus  the  ref  and  def  lists  cannot  be  completed. 
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Create  regimen  of  test  cases 
and  required  outcomes  (TDC'sK 


Oompose  new  source  text 
(code  and  assertions) 
(P.U. 1 s  and  E .U. ' s) . 


(i)  EXECUTE  dynamic  tests; 

(configure  E.U.'s,  T0P;s,  TDC's) 


(J)  Set  up 
new  test 


errors^/ 

identified 


— no_  errors 
errors  notedf7  // 
not  understood  // 


Vy'  (7)  BROWSE  source  test  and  ex-  \  \ 
ecution  history. _  |  \ 

^  still  no  help  \ 

_ _ _  _ _ * _ _ _ 

(8)  Create  different  test  data. and/or 
activate  more  probes  and  monitors. 


Confident* 
code  is 
now 

^correct/ 


last 

resort 


.run  new  tests 


®  Restructure  code 


Polish  and/or  structure  code; 
rerun  static  and  dynamic  tests. 


Figure  6 


done 


TOOL PACK 
Command 
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Parsed  tree 
representation  of 
command 
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~\ 


File  of  machine 
independent  directives 
to  invoke  tool  fragments 
and  file  system  primitives 


Sequential 
invocation 
of  tool 
fragments - 
and  file 
system 
primitives 


Figure  7.  Schematic  representation  of  the 
TOOLPACK  command  interpreter 
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A  MODIFIED  KROENIG-FENNEY  MODEL 


Francis  E.  Council,  Jr.1 


ABSTRACT .  The  Kroenig-Penney  model  of  a  crystal  has  been  modified  such 
that  it  can  be  related  to  specific  materials.  One  electron  Green's  functions 
have  been  used  to  introduce  ionization  potentials  and  thus  gain  specificity. 

This  approach  permits  polyvalent  materials  to  be  studied  with  applications  to 
the  theory  of  elasticity  where  change  of  the  internal  energy  of  a  crystal  is 
considered  as  a  result  of  deformations. 

I.  INTRODUCTION.  The  binding  energy  of  a  solid  is  considered  as  result¬ 
ing  from  the  alteration  of  the  valence  electron  wave  functions  and  the  ion  core 
wave  functions.  Since  the  core  electrons  shield  each  other,  the  primary  cause 
of  the  binding  energy  is  the  valence  electrons.  There  are  several  ways  of  de¬ 
scribing  the  valence  electrons;  Frohlich's  [1]  approach,  as  well  as  that  of 
many  others  not  listed,  was  to  divide  a  crystal  of  the  metal  into  polyhedrons. 
Each  polyhedron  was  associated  with  one  atom  and  it  was  assumed  that  there  was 
little,  if  any,  overlap  of  the  wave  functions  of  the  electrons  of  one  polyhed¬ 
ron  or  cell  with  the  electrons  of  another  cell.  This  approach  ignores  the  long 
range  Coulombic  interactions  between  ion-cores  and  the  valence  electrons  since 
the  boundary  condition  is  that  the  normal  derivative  of  the  wave  function  is 
equal  to  zero  at  the  boundary  of  a  cell. 

If  a  polyvalent  atom  is  being  considered,  then  the  wave  function  is  usually 
approximated  as  the  result  of  an  average  potential;  Raimes  [2,3]  is  an  example 
of  this.  Sachs  [4]  showed  how  the  original  Wigner-Seitz  model  with  only 
one  electron  in  a  cell  can  be  abandoned  by  assuming  a  uniform  charge  distribu¬ 
tion,  The  free  electron  approximation  was  the  approach  that  Wigner  and  Seitz 
[5,6]  and  many  others  have  used,  whereas  the  tight  binding  approximation  was 
used  by  Mott  and  Jones  [7]  in  which  a  valence  electron  is  affected  to  a  much 
greater  degree  by  the  Coulombic  potential  of  the  parent  ion  than  by  neighbors. 
Both  the  tight  binding  approximation  and  the  free  electron  approach  have  an 
applicability,  particularly  for  the  alkali  metals.  A  possible  reason  that  both 
methods,  free  electron  and  tight  binding,  are  useful  is  that  the  Fermi  surfaces 
for  these  metals  are  approximately  spherical  such  that  pressure  changes  do  not 
markedly  affect  a  Fermi  surface  although  Bardeen  [8]  has  indicated  that  there  is 
a  fifteen  percent  difference  between  the  experimental  and  theoretical  values  of 
the  compressibility  of  lithium. 

In  view  of  the  preceding  statements,  some  other  approach  should  be  used  if 
the  properties  of  a  polyvalent  material  are  to  be  described  properly.  Also, 
Nazieres  and  Pines  [9]  have  indicated  that  the  kinetic  energy  and  the  potential 
play  roughly  comparable  roles  in  determining  electron  behavior,  implying  that 
neither  the  tight  binding  nor  free  electron  approximation  is  adequate  for  a 
proper  description  of  the  valence  electrons.  One  way  of  combining  these  two 
descriptions  is  to  use  Bloch  functions  with  a  Kroenig-Penney  model  of  the  crys¬ 
tal  being  modified,  such  that  both  the  Coulombic  and  repulsive  potentials  are 
included. 

^Formerly  with  Management  Information  Systems,  Directorate  Army  Mobility  Equip¬ 
ment  Research  and  Development  Command,  Fort  Belvoir,  VA.  Dr,  Council  is  present 
ly  with  Vitro  Laboratories,  Inc.,  Silver  Spring.  MD. 
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II .  THE  TIGHT  BINDING  APPROXIMATION  WAVE  FUNCTION.  One  way  of  developing 
wave  functions  that  are  suitable  for  the  tight  binding  approximation  is  to  use 
one  electron  Green's  functions.  Start  first  with  the  wave  function  in  the 
Schroedinger  representation, 

H  <rlt>  -  i)4  <r|t>  (1) 


with  an  associated  Green's  function, 

(i^  -  H  ±  i£)G±(r,r';t,t’)  =  6(r 


r' )6(t  -  t' ) . 


(2) 


The  Green's  function  is  a  function  of  two  spatial  variables,  r  and  r’ ;  two  time 
variables,  t  and  t ' ;  and  a  parameter  £,  introduced  in  order  that  the  passage  of 
time  may  be  considered  in  both  a  forward  and  reverse  direction.  The  Green's 
function  takes  the  wave  function  at  one  position  and  time  to  some  other  position 
and  time, 


<r|t> 


/G_(r,r' ;t,t' )<r' I t’>dr' 


for  t  >  t' 


iJ4  J*G+(r,r'  ;t,t'  )<t'|  t’>dr'  for  t  <  t' 

If  the  Hamiltonian  is  independent  of  time,  then  a  Fourier  transform  of  the 
time  variable  gives 


G±(r,r';t,t') 


dE  G+ ( r , r ' ;E) 


e- 


iE(t  -  t’) 

H 


(3) 


(4) 


and  is  a  solution  of  the  equation, 

(E  -  H  +  i£.)G±(r,  r' ;t,t’ )  =  6(r  -  ir' ) 


(5) 


Since  the  Green's  function  is  not  translationally  invariant,  if  a  Fourier  trans¬ 
form  is  performed  on  the  Green's  function  with  respect  to  the  spatial  variables, 
then  both  variables  must  be  transformed.  Consequently, 


G±(p,p’;E) 


<p|G±|p’> 


1  IJdr  dr'  e-*'’’-'  ‘  r'  ' r'^G±(r, ;E) 

I27THT3 


(6) 


Now  return  to  equation  (5),  pre-and  post  multiply  by  <p|  and  |p ' >  respec¬ 
tively,  and  insert  a  unity  expression,  |p"><p"|,  with  the  result 

<  plE  -  H  +i£|p"><p"|G+|p'>  =  S(p  -  p’)  (7) 
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(8) 


The  kinetic  energy  of  equation  (7)  is  expressed  as 
<p|H2V2/2m |p' >  =  6(p  -  p') 


While  if  a  tight  binding  approximation  is  being  considered,  then  the  potential 
can  be  introduced  as  a  Dirac  delta  function  potential.  By  means  of  this  tech¬ 
nique,  the  potential  energy  for  each  of  the  valence  electrons  can  be  expressed 
in  terras  of  the  various  ionization  potentials.  If  the  Dirac  delta  function  po¬ 
tential  is 

v  =  -n(r*) 


then  the  ionization  potential  is  introduced  by  letting  V\  be  equal  to  the  ioni¬ 
zation  potential.  The  Fourier  transform  of  this  potential  is 

V(p')  =  -  —0—  (10) 

(27tJ4)3/2 

and 

<p|vrns|p’>  =  -  n  <5(p  -p1)  (11) 

A  way  of  further  development  is  to  use  perturbation  theory  by  modifying  an 
equation  by  Harrison  [10], 

G(p,p')  ~G° (p)S(p-p' )+G° (p)<p| V |p' >G° (p'  ) 

+  f 3°(p)<p|V|pi  >dp  G°(Pi)<pi|V|p,>G°(p,)  +  .  .  .  (12) 

with  the  second  wave  number  index  being  suppressed  since  the  indices  are  always 
the  same.  The  zero  order  Green's  function,  G°(p),  is  obtained  from  equation  (5) 
by  letting  the  potential  energy  of  the  Hamiltonian  be  equal  to  zero.  If  equa¬ 
tion  (11)  is  considered,  evaluation  of  the  second  and  third  terms  of  equation 
(12)  gives 

-G0(p)G°(p')nS(p-p’)  (13) 


G°(p)G°(p)G  (p')n2  6(p-p').  (14) 

It  is  obvious  from  the  preceding  that  a  series  is  being  developed  such  that 

G(p,p* )  =  - (p2/2m)  6(p  -  p')  (15) 

1  +  r|(p’)2/2m  (27t|A)3/2 
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With  a  suitable  Green’s  function  having  been  developed,  then  the  individ¬ 
ual  wave  functions  in  p  space  are  obtained  as 

x(p)  =  /  G(p,p')  x(p')  dp'  (16) 

The  wave  function  in  r  space  is  obtained  by  an  inverse  Fourier  transform  of 
equation  (16) 

TTB  =  (2t ifrp/2^  P  t|AxTB^p')dp  ^17) 

with  the  subscript  ^  being  used  to  indicate  that  the  tight  binding  approxima¬ 
tion  has  been  used.  The  momentum  wave  functions  that  are  used  here  are  semi- 
empirical  wave  functions  developed  by  Duncanson  and  Carlson  [11,  12)  based 
on  initial  work  by  Slater  [13]  .  Slater  developed  some  approximate  analytical 
expressions  which  correct  for  the  shielding  effect  of  the  core  electrons  of  the 
nucleus. 

III.  THE  FREE  ELECTRON  APPROXIMATION.  If  an  electron  is  no  longer  primar¬ 
ily  influenced  by  its  parent  ion,  then  its  behavior  is  described  by  the  quasita 
free  electron  approximation.  The  influence  of  a  periodic  potential  of  a  crystal 
causes  the  electron’s  motion  to  be  no  longer  perfectly  free.  This  potential 
energy  affecting  the  electron  can  be  considered  as  a  combination  of  Coulombic 
and  repulsive  potential  as  a  function  of  position;  in  order  to  simplify  comput¬ 
ations  it  will  be  considered  as  a  constant.  The  effect  of  temperature  could 
also  be  included  at  this  time;  again  in  order  to  simplify  computations,  this 
effect  will  be  ignored. 

These  quasi-free  electrons  exist  in  a  potential  well  of  energy.  The  elec¬ 
trons  fill  up  all  the  energy  states  in  the  well  to  a  level  above  the  bottom  of 
the  well  with  the  potential  depth  being  related  to  the  cut  off  energy  as 

V0  =  Ec .  +  e  i|i.  (18) 

The  expression  of  e  is  the  minimum  energy  to  remove  an  electron  from  a  metal 
and  is  usually  considered  as  the  work  function.  Consequently,  if  the  work  func¬ 
tion  is  expressed  in  electron  volts  and  the  magnitude  of  the  depth  of  the  poten^ 
tial  well  for  a  particular  valence  electron  is  its  ionization  potential,  then 
the  cut  off  momentum  value  is 

Po  =  (2m(V.  -  e  ill))**  (19) 

The  expected  value  of  the  magnitude  of  the  momentum  is 

=  |  (2m(V.  -  e  it*))"5 
J  X 

(20) 


612 


such  that  a  wave  function  related  to  an  average  value  of  the  momentum  is 


V 


QFE  ( 2n )6/ 


1  ip.  ■ 

3/2  e  1 


r/H 


(21) 


with  the  subscript 


indicating  that  the  quasi  free  approximation  is  being  used. 


The  Block  Wave  Functions.  Two  wave  functions  have  been  developed,  one  suit¬ 
able  for  the  quasi-free  approximation  and  the  other  for  the  tight  binding  approx¬ 
imation.  These  can  be  combined  into  a  Bloch  wave  function  as 


(22) 


with  (r  -  r.)  being  related  to  the  periodic  potential.  The  wave  function 
1  b  j 

for  a  sum  of  wave  functions  as  would  be  reflected  by  the  different  ionization 
potentials  is 


(23) 


with  the  j  being  a  subscript  associated  with  the  different  ionization  poten¬ 
tials.  For  example,  for  a  divalent  material  with  ionization  potentials  of 
and  for  the  valence  electrons, 


_  i  V 
-  yj- 

j 


VI1 


ei(2m(V.  -  e  <J0) 


r  eip*  (r  -  jr)Xi(P*  )(p2/2m)5(p  -  p' )dp' 

J  1  +  VCp'WZmjUTlH)*'2 


(24) 


i (2m(V2  -  e  4<))'sr  f  ip-(l  -  r . )X2 (p‘ ) (p2/2m)fi(p  -  p’)dp’ 


/ 


1  +  V2  ( (p 1  )2/2m)  (27tjfl) 3 /2 


IV.  CONCLUSIONS.  A  prescription  by  which  a  Kroenig-Penney  model  of  a 
crystal  can  be  modified  for  specific  materials  has  been  obtained.  This  develop¬ 
ment  can  be  modified  for  greater  precision  if  the  average  potential  of  a  crystal 
is  given  as  a  function  of  position  or  if  the  effect  of  temperature  is  included. 
The  primary  reason  for  this  derivation  is  to  facilitate  calculations  in  which 
the  change  of  internal  energy  of  a  crystal  is  a  result  of  deformations. 
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COMPUTATION  OF  MATRIX  CHAIN  PRODUCTS 


T.  C.  Hu  and  M,  T,  Shing 
University  of  California,  San  Diego 
Da  Jolla,  CA  92093 


ABSTRACT.  This  paper  considers  the  computation  of  matrix  chain  products 
of  the  form  M^  X  M  ^  X  -  -  •  X  Mn_  j .  If  the  matrices  are  of  different  dimensions,  the 
order  in  which  the  matrices  are  computed  affects  the  number  of  operations.  An 
optimum  order  is  an  order  which  minimizes  the  total  number  of  operations.  We 
present  some  theorems  about  an  optimum  order  of  computing  the  matrices.  Based 
on  these  theorems,  algorithms  for  finding  an  optimum  order  are  developed. 

1,  INTRODUCTION.  Consider  the  evaluation  of  the  product  of  n-1  matrices 

M  =  M  X  M  -  X  ■  •  •  X  M  (1) 

1  Z  n-1 

where  is  a  X  wi+i  matrix.  Since  matrix  multiplication  satisfies  the  associa¬ 
tive  law,  the  final  result  M  in  (1)  is  the  same  for  all  orders  of  multiplying  the 
matrices.  However,  the  order  of  multiplication  greatly  affects  the  total  number  of 
operations  to  evaluate  M.  The  problem  is  to  find  an  optimum  order  of  multiplying 
the  matrices  such  that  the  total  number  of  operations  is  minimized.  Here,  we 
assume  that  the  number  of  operations  to  multiply  a  pxq  matrix  by  a  qXr  matrix 
is  pqr. 

In  refs.  1  and  6,  a  dynamic  programming  algorithm  is  used  to  find  an  optimum 
order.  The  algorithm  needs  O(n^)  time  and  O(n^)  space/  In  ref.  2,  Chandra  pro¬ 
posed  a  heuristic  algorithm  to  find  an  order  of  computation  which  requires  no  more 
than  2TQ  operations  where  TQ  is  the  total  number  of  operations  to  evaluate  (1)  in  an 
optimum  order.  This  heuristic  algorithm  needs  only  O(n)  time.  Chin  (ref.  3)  pro¬ 
posed  an  improved  heuristic  algorithm  to  give  an  order  of  computation  which  re¬ 
quires  no  more  than  1.  25  Tq#  This  improved  heuristic  algorithm  also  needs  only 
O(n)  time. 

In  this  paper  we  first  transform  the  matrix  chain  product  problem  into  a  prob- 
lem  in  graph  theory  -  the  problem  of  partitioning  a  convex  polygon  into  non- inter¬ 
secting  triangles  (see  ref,  8),  then  we  state  several  theorems  about  the  optimum 
partitioning  problem.  Based  on  these  theorems,  algorithms  for  finding  optimum 
partitions  are  developed. 

2,  PARTITIONING  A  CONVEX  POLYGON.  Given  an  n- sided  convex  polygon, 
such  as  the  hexagon  shown  in  Fig,  1,  the  number  of  ways  to  partition  the  polygon  into 
(n-2)  triangles  by  non- inter secting  diagonals  is  the  Catalan  numbers  (see,  for 
example,  ref.  7),  Thus,  there  are  two  ways  to  partition  a  convex  quadrilateral, 
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five  ways  to  partition  a  convex  pentagon,  and  fourteen  ways  to  partition  a  convex 
hexagon. 

Let  every  vertex  V.  of  the  convex  polygon  have  a  weight  w^  .  We  can  define 
the  cost  of  a  given  partition  as  follows:  The  cost  of  a  triangle  is  the  product  of  the 
weights  of  the  three  vertices,  and  the  cost  of  partitioning  a  polygon  is  the  sum  of  the 
costs  of  all  its  triangles.  For  example,  the  cost  of  the  partition  of  the  hexagon  in 
Fig,  1  is 


Fig.  1 


If  we  erase  the  diagonal  from  V3  to  and  replace  it  by  the  diagonal  from  Vj  to 
V4  ,  then  the  cost  of  the  new  partition  will  be 

W1W2W3+W1W3W4+W1W4W6+W4W5W6  *  (3) 

We  will  prove  that  an  order  of  multiplying  (n-1)  matrices  corresponds  to  a 
partition  of  a  convex  polygon  with  n  sides.  The  cost  of  the  partition  is  the  total  num¬ 
ber  of  operations  needed  in  multiplying  the  matrices.  For  brevity,  we  shall  use  n- 
gon  to  mean  a  convex  polygon  with  n  sides,  and  the  partition  of  n-gon  to  mean  the 
partitioning  of  an  n-gon  into  (n-2)  non-intersecting  triangles. 

For  any  n-gon,  one  side  of  the  n-gon  will  be  considered  to  be  its  base,  and 
will  usually  be  drawn  horizontally  at  the  bottom  such  as  the  side  V^-V^  in  Fig.  1. 
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This  side  will  be  called  the  base,  all  other  sides  are  considered  in  a  clockwise  way. 
Thus,  V1-V2  is  the  first  side,  V2-V3  the  second  side,...,  and  V^-V^  the  fifth  side. 

The  first  side  represents  the  first  matrix  in  the  matrix  chain  and  the  base 
represents  the  final  result  M  in  (l).  The  dimensions  of  a  matrix  are  the  two  weights 
associated  with  the  two  end  vertices  of  the  side.  Since  the  adjacent  matrices  are 
compatible,  the  dimensions  Wj  X  w^,  w2  1  X  wn  can  written  inside 

the  ve  rtices  as  .  .  .  ,  wn.  The  diagonals  are  tfie  partial  products.  A  partition 

of  an  n-gon  corresponds  to  an  alphabetic  tree  of  n-1  leaves  or  the  parenthesis  prob¬ 
lem  of  n-1  symbols  (see,  for  example,  ref.  5).  It  is  easy  to  see  the  one-to-one  cor¬ 
respondence  between  the  multiplication  of  n-1  matrices  to  either  the  alphabetic  tree 
or  the  parenthesis  problem  of  n-1  symbols.  We  state  this  fact  as  Lemma  1, 

Lemma  1.  Any  order  of  multiplying  n-1  matrices  corresponds  to  a  partition  of  an 
n-gon.  ■ 


We  can  also  establish  the  correspondence  between  the  matrix-chain  products 
and  the  partitions  of  a  convex  polygon  directly.  See  ref.  8  for  more  details. 

Lemma  2.  The  minimum  number  of  operations  to  evaluate  the  following  matrix 
chain  products  are  identical. 


M, 

X  M  X  •  •  • 

X  M  0 

X  M  , 

1 

2 

n-2 

n-  1 

M 

X  M,  X  •  •  * 

X  M  _ 

X  M  , 

n 

1 

n-3 

n-2 

M  xM  x***xM  _  X  M 
£  3  n-1  n 

where  has  dimension  X  w^.^  and  wn+^  ==  .  Note  that  in  the  first  matrix 

chain,  the  resulting  matrix  is  of  dimension  by  wn .  In  the  last  matrix  chain,  the 
resulting  matrix  is  of  dimension  by  .  But  in  all  cases,  the  total  number  of 
operations  in  the  optimum  orders  of  multiplication  is  the  same. 

Proof,.  The  cyclic  permutations  of  the  n-1  matrices  all  correspond  to  the  same 
n-gon  and  thus  have  the  same  optimum  partitions,  ■ 

(This  Lemma  was  obtained  independently  in  ref,  4  with  a  long  proof.  ) 

From  now  on,  we  shall  concentrate  only  on  the  partitioning  problem. 

The  diagonals  inside  the  polygon  are  called  arcs.  Thus,  every  partition  con¬ 
sists  of  n-2  triangles  formed  by  n-3  arcs  and  n  sides. 

In  a  partition  of  an  n-gon,  the  degree  of  a  vertex  is  the  number  of  arcs  inci¬ 
dent  on  the  vertex  plus  two  (since  there  are  two  sides  incident  on  every  vertex). 
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Lemma  3,  In  any  partition  of  an  n-gon,  n  :>  4,  there  are  at  least  two  triangles, 
each  has  a  vertex  of  degree  two.  (For  example,  in  Fig.  1,  the  triangle  has 

vertex  V2  with  degree  Z  and  the  triangle  V ^  has  vertex  with  degree  2.  )  ■ 

Lemma  4.  Let  P  and  P'  both  be  n-gone  where  the  corresponding  weights  of  the 
vertices  satisfy  ^  w. '  f  then  the  cost  of  an  optimum  partition  of  P  is  less  than 
or  equal  to  the  cost  of  an  optimum  partition  of  P'  ,  ■ 

If  we  use  C(w^,  W2rWj,  -  *  .  ,  w^.)  to  mean  the  minimum  cost  of  partitioning  the 
k-gon  with  weights  w^  optimally,  Lemma  4  can  be  stated  as 

C(w  ,  w~,  .  •  *  ,  w  )  £  C(w /  ,  w  * ,  *  .  .  ,  w  ' )  if  w.  ^  w / 

Y  Lt  K  1  <t£  K  11 

We  say  that  two  vertices  are  connected  in  an  optimum  partition  if  the  two 
vertices  are  connected  by  an  arc  or  if  the  two  vertices  are  adjacent  to  the  same  side. 

In  the  rest  of  the  paper,  we  shall  use  V^,  V ,  *  .  ,  Vn  to  denote  vertices  which 
are  ordered  according  to  their  weights,  i.  e.  ,  ^  £  *•*  ^wn‘  estate  the 

presentation,  we  introduce  a  tie -breaking  rule  for  vertices  of  equal  weights. 

If  there  are  two  or  more  vertices  with  weights  equal  to  the  smallest  weight 
w^  ,  we  can  arbitrarily  choose  one  of  these  vertices  to  be  the  vertex  .  Once  the 
vertex  is  chosen,  further  ties  in  equal  weights  are  resolved  by  regarding  the 
vertex  which  is  closer  to  in  the  clockwise  direction  to  be  of  less  weight.  With 
this  tie -breaking  rule,  we  can  unambiguously  label  the  vertices  V^,  V 2*  *  -  •  »  ^or 
each  choice  of  . 

We  shall  use  Va,  V^,  ...  to  denote  vertices  which  are  unordered  in  weights, 
and  Tjjk  to  denote  the  product  of  the  weights  of  any  three  vertices  ,  Vj  and 

3.  SOME  CHARACTERISTICS  OF  THE  OPTIMUM  PARTITIONS.  First,  let 
us  consider  the  polygons  where  there  are  two  or  more  vertices  with  equal  weights  Wj* 

Lemma  5.  For  every  choice  of  Vj,V2*...  (as  prescribed),  if  the  weights  of  the 
vertices  satisfy  the  condition 

12  3  n 

then  exists  in  every  optimum  partition  of  the  n-gon. 

Proof.  The  lemma  is  true  if  V,  -V 2  a  side  °f  the  n-gon.  Hence,  we  can  assume 
that  V|,  are  not  adjacent  to  the  same  side  of  the  n-gon. 

The  proof  is  by  induction  on  the  size  of  the  n-gon.  The  lemma  is  true  for  a 
triangle  and  a  quadrilateral.  Assume  that  the  lemma  is  true  for  all  k-gons 
(3  £  k  ^  n-1)  and  consider  the  optimum  partitions  of  an  n-gon. 
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By  Lemma  3,  we  know  that  there  are  at  least  two  vertices  with  degree  two  in 
each  optimum  partition  of  the  n-gon.  We  have  the  following  two  cases. 

(i)  In  an  optimum  partition  of  an  n-gon,  one  of  the  vertices  with  degree  two,  say 
Vj  ,  has  weights  larger  than  .  In  this  case,  we  can  form  an  (n-l)-gon  by  remov¬ 
ing  Vj  with  its  two  sides.  By  induction  assumption,  is  present  in  every 

optimum  partition  of  the  (n-l)-gon. 

(ii)  Consider  the  complementary  case  of  (i),  i.  e.  all  vertices  with  degree  two  have 
weights  equal  to  in  an  optimum  partition  of  the  n-gon.  In  other  words,  and 
are  the  only  two  vertices  with  degree  two  in  that  optimum  partition,  as  shown 
symbolically  in  Fig,  2a.  Note  that  every  arc  in  the  optimum  partition  must  dissect 
the  n-gon  into  two  subpolygons  in  such  a  way  that  V]_,  V£  can  never  appear  in  any 
subpolygon  together,  else  there  will  be  more  than  two  vertices  with  degree  two  in 
the  optimum  partition.  In  Fig,  2b,  we  show  a  partition  of  the  n-gon  in  which  and 
V 2  are  connected.  Let  us  denote  the  n-  2  triangles  in  Fig.  2a  by  .  .  .  ,  Pn_£. 

Except  P^  and  PR_2,  aU  the  °^er  n-4  triangles  are  made  up  of  one  side  and  two 
arcs  each.  For  each  of  these  n-4  triangles,  we  can  find  a  unique  triangle  in  Fig.  2b 
such  that  they  both  consist  of  the  same  side.  We  use  P/  to  denote  the  image  of 
in  Fig.  2b.  The  only  two  triangles  left  unmatched  in  Fig.  2b  are  V^VaV ^  and 
and  they  are  the  images  of  and  Pn  2,  respectively.  Let  the  cost  of  P*  be  Cb 
and  the  cost  of  P*'  be  C .  Since  C -7  ^  CL  for  1  £  i  £  n-2,  the  partition  in  Fig.  2b 
is  cheaper  than  that  in  Fig.  2a  and  we  have  a  contradiction.  ■ 


Fig.  2 
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Theorem  1,  For  every  choice  of  Vp  V^,  .  .  *  (as  prescribed),  if  the  weights  of  the 
vertices  satisfy  the  condition 

w,  =  w _  <  w0  £  w.  ^  ^  w  , 

1  c  3  4  n 

then  every  optimum  partition  of  the  n-gon  must  contain  a  triangle  V^V^Vp  for  some 
vertex  Vp  with  weight  equal  to  w3  .  Note  that  if  w^  =  <  w^  £  •  *  •  <  wn  ,  then 

every  optimum  partition  must  contain  the  triangle  V  ^V £V 3  since  there  is  a  unique 
choice  of  V3  . 

Proof.  Similar  to  Lemma  5,  we  can  prove  this  theorem  by  induction  on  the  size  of 
the  n-gon.  The  theorem  is  true  for  any  triangle  or  quadrilateral  satisfying  the 
above  condition.  Assume  the  theorem  is  true  for  all  k-gons  (3  ^  k  ^  n-1)  and  con¬ 
sider  the  optimum  partitions  of  an  n-gon. 

From  Lemma  5,  we  know  that  Vp  V->  are  always  connected  in  every  optimum 
partition.  Hence,  without  loss  of  generality,  we  can  assume  Vp  Y^  to  be  adjacent 
to  the  same  side  of  the  n-gon.  Again,  we  have  the  following  two  cases* 

(i)  In  an  optimum  partition,  one  of  the  vertices  with  degree  two,  say  Vp  has 

weight  larger  than  W3  .  In  this  case,  we  can  remove  V-  with  its  sides  and  form  an 
(n-l)-gon.  By  induction  assumption,  every  optimum  partition  of  the  (n-l)-gon  con¬ 
tains  a  triangle  where  =  w3  * 

(ii)  Consider  the  complementary  case  if  (i),  in  an  optimum  partition  of  the  n-gon, 
all  vertices  with  degree  two  have  weights  less  than  or  equal  to  w3  .  Since  VpV2  is 
a  side  of  the  n-gon,  for  n  ^  4,  either  V1  or  V 2  (but  not  both)  can  have  degree  two. 

We  have  the  following  two  subcases: 

(a)  If  there  are  more  than  one  vertex  whose  weight  equals  w3  ,  we  can  form 
an  (n-l)-gon  by  removing  one  of  those  degree  two  vertices  whose  weight  equals  w3  . 
By  induction  assumption,  every  optimum  partition  of  the  (n-l)-gon  contains  a  tri¬ 
angle  V ^ V 2Vp  for  some  Vp  with  Wp  -  W3  . 

(b)  There  exists  only  one  vertex  of  weight  wp  In  this  case,  there  must  be 
only  two  vertices  with  degree  two  in  the  optimum  partition  of  the  n-gon.  These  two 
vertices  are  V3  and  either  V^  or  V^4  Without  loss  of  generality,  we  can  assume  V^ 
has  degree  2.  The  situation  is  shown  symbolically  in  Fig.  3a.  Again,  every  arc  in 
the  optimum  partition  must  dissect  the  n-gon  in  such  a  way  that  Vj  and  V3  can  never 
appear  in  any  sub  polygon  together.  In  Fig.  3b,  we  show  a  partition  containing  the 
triangle  V1V2V3,  Using  arguments  similar  to  those  in  the  proof  of  Lemma  5,  we 
can  show  that  the  partition  in  Fig.  3b  is  cheaper  and  we  obtain  a  contradiction*  I 
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Fig.  3 

Theorem  2.  For  every  choice  of  Vj_,  V^,  -  .  .  (as  prescribed),  if  the  weights  of  the 
vertices  of  the  n-gon  satisfy  the  following  condition, 

wi  "  W9  =  '  ‘  #  z  w  <  w.  ,  <  •  •  '  ^  w 

1  £  k  k+1  n 

for  some  k,  3  <;  k  <  n,  then  every  optimum  partition  of  the  n-gon  contains  the  k-gon 


Proof.  The  proof  is  by  induction  on  the  size  of  the  n-gon.  The  theorem  is  true  for 
any  triangle  and  quadrilateral.  Suppose  the  theorem  is  true  for  all  polygons  with 
(n-1)  sides  or  less  and  consider  the  optimum  partitions  of  an  n-gon. 

From  Lemma  3,  there  exist  at  least  two  vertices  having  degree  two  in  every 
optimum  partition.  We  have  the  following  two  cases. 

(i)  In  an  optimum  partition  of  the  n-gon,  one  of  the  vertices  with  degree  two,  say 
Vj  ,  has  weight  larger  than  .  In  this  case,  we  can  remove  the  vertex  with  its 
two  sides  and  obtain  an  (n-l)-gon.  By  induction  assumption,  every  optimum  parti¬ 
tion  of  the  (n-l)-gon  contains  the  k-gon  V^-V^-  .  *  . 
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(ii)  Consider  the  complementary  case  of  (i),  i.  e.  ,  all  the  vertices  with  degree  two 
have  weights  equal  to  W|  in  an  optimum  partition.  Let  two  of  these  vertices  be 
Vi  ,  Vj  ,  We  have  the  following  two  subcases: 

(a)  k  >  3.  We  first  form  an  (n-l)-gon  by  removing  and  its  two  sides. 

There  are  (k-1)  vertices  with  weights  equal  to  w^  in  the  (n-l)-gon.  By  induction 
assumption,  every  optimum  partition  of  the  (n-l)-gon  contains  the  (k-l)-gon  which 
includes  Vj  as  one  of  its  vertices.  Since  Vj  has  degree  two  in  the  optimum  parti¬ 
tion,  its  two  neighboring  vertices,  say  Vx  and  Vy  ,  must  also  have  weights  equal 
to  wj  and  the  arc  Vx-V  exists  in  the  optimum  partition  (Fig.  4).  Similarly,  we 
can  remove  the  vertex  Vj  with  its  two  sides  Vj-Vx  and  Vj-V  and  form  an  (n-1)- 
gon.  By  induction  assumption,  every  optimum  partition  of  the  (n-l)-gon  contains 
the  (k-l)-gon  formed  by  the  (k-1)  vertices  with  weights  equal  to  in  the  (n-l)-gon 
and  is  one  of  the  vertices  in  the  (k-l)-gon.  Now,  by  pasting  the  triangle  VxVjVy 
and  the  (k-l)-gon  together,  we  form  a  k-gon  which  includes  all  the  vertices  with 
weight  equal  to  w^  in  the  n-gon  and  this  k-gon  is  contained  in  the  optimum  partition 
of  the  n-gon. 


Fig.  4 


(b)  k  -  3.  In  this  case,  we  have  ^  ^  Wn  *  Without  loss 

of  generality,  we  can  assume  V ^  and  both  have  degree  two  in  an  optimum  parti¬ 
tion.  Again,  we  can  form  an  (n-l)-gon  by  removing  Vj  and  its  two  sides.  By 
Lemma  5,  V£  and  V3  are  connected  in  every  optimum  partition  of  the  (n-l)-gon. 
Since  V2  has  degree  two,  V2-V3  must  be  a  side  of  the  n-gon.  Next,  we  can  remove 
with  its  two  sides  and  form  an  (n-l)-gon.  By  Lemma  5,  Vj,  are  connected 
by  a  side  of  the  n-gon.  The  situation  is  shown  in  Fig.  5a.  Then,  the  partition  in 
Fig.  5b  is  cheaper  because 
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T  +  T  <-  T  +  T 
1Z3  I2y  ^  I3x  23y 


and 


C(w  .  ,  w  ,  .  .  .  ,  w  )  £  C(w„ ,  w  ,  .  ,  .  ,  w  ) 
lx  y  3  x  y 


Fig,  5 


Now,  whenever  we  have  three  or  more  vertices  with  weights  equal  to  wj  in 
the  n-gon,  we  can  decompose  the  n-gon  into  subpolygons  by  forming  the  k-gon  in 
Theorem  2.  The  partition  of  the  k-gon  can  be  arbitrary,  since  all  vertices  of  the 
k-gon  are  of  equal  weight.  For  any  subpolygon  with  two  vertices  of  weights  equal 
to  w^  ,  we  can  always  apply  Theorem  1  and  decompose  the  subpolygon  into  smaller 
subpolygons.  Hence,  we  have  only  to  consider  the  polygons  with  a  unique  choice  of 
Yl  ,  i.  e,  ,  each  polygon  has  only  one  vertex  with  weight  equal  to  * 

Theorem  3#  For  every  choice  of  Y 2*  *  *  -  (as  prescribed),  if  the  weights  of  the 
vertices  satisfy  the  condition 

w,  <  w„  <  w„  ^  •  •  •  <  vs  , 

12  3  n 

then  and  exist  in  every  optimum  partition  of  the  n-gon. 

Proof,  We  can  again  use  Lemma  3  and  prove  Theorem  3  by  the  induction  on  the  size 
of  the  n-gon.  ■ 
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(a)  A  stable  partition  (b)  An  optimum  partition 

Fig.  6 


In  any  partition  of  an  n-gon,  every  arc  dissects  a  unique  quadrilateral.  Let 
V  ,  V  ,  V  ,  V  be  the  four  vertices  of  an  inscribed  quadrilateral  and  Vx-Vz  be 
the  arc  which  dissects  the  quadrilateral.  We  define  Vx-Vz  to  be  a  vertical  arc  if 
(6)  or  (7)  is  satisifed. 


minfw  ,  w  )  <  min(w  ,  w  ) 
x  z  y  w 


min(w  ,  w  )  =  min(w  ,  w  ) 
x  z  y  w 

max(w  »w  )  ^  max(w  ,w  ) 
x  z  y  w 

We  define  V^-V„  to  be  a  horizontal  arc  if  (8)  is  satisfied 

a  Z  - -  ■ 

minfw  ,  w  )  >  min(w  ,  w  ) 

X  Z  y  w  I  (8) 

maxfw  ,w  )  <  max(w  ,w  )  1 

x  z  y  w  > 

For  brevity,  we  shall  use  h-arcs  and  v-arcs  to  denote  horizontal  arcs  and  vertical 
arcs  from  now  on. 

Corollary  2,  All  arcs  in  an  optimum  partition  must  be  either  vertical  arcs  or  hori¬ 
zontal  arcs.  ■ 

Theorem  5.  Let  Vx  and  Vz  be  two  arbitrary  vertices  which  are  not  adjacent  in  a 

polygon,  and  Vw  be  the  smallest  vertex  from  Vx  to  Vz  in  the  clockwise  manner 

(Vw  ^  Vx,  Vw  /  vz)f  and  be  the  smallest  vertex  from  Vz  to  Vx  in  the  clockwise 

manner  (Vv  V  Vx  ,  V  ^  V,  ).  This  is  shown  in  Fig.  7  where  we  assume  that 
y  "  y  z 
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Fig.  7 


w  ^  w  and  w  ^  w  The  necessary  condition  for  V  -V  to  exist  as  an  h-arc  in 
x  z  Y  *  *  x  z* 

any  optimum  partition  is 


w  <  w  ^  w  <  w  *  ■ 

y  x  z  w 

We  call  any  arc  which  satisfied  this  necessary  condition  a  potential  h-arc. 

Let  P  be  the  set  of  potential  h-arcs  in  the  n-gon  and  H  be  the  set  of  h-arcs  in  the 
optimum  partitions;  we  have  P  d  H  where  the  inclusion  could  be  proper. 

Corollary  3.  Let  Vw  be  the  largest  vertex  in  the  polygon  and  Vx  and  V2  be  its  two 
neighboring  vertices.  If  there  exists  a  vertex  Vy  such  that  wy  <  and  wy  <  . 

then  Vx-Vz  is  a  potential  h-arc.  ■ 

Two  arcs  are  called  compatible  if  both  arcs  can  exist  simultaneously  in  a 
partition.  Assume  that  all  weights  of  vertices  are  distinct,  then  there  are  (n-1)! 
distinct  permutations  of  the  weights  around  an  n-gon.  For  example,  the  weights  10, 
11,  25,  40,  12  in  Fig.  6(a)  correspond  to  the  permutations  Wj,  w^»  w$,  W3 

(where  <  w3  <  <  w^),  There  are  infinitely  many  values  of  the  weights 

which  correspond  to  the  same  permutation.  For  example,  1,  16,  34,  77,  29  also 
correspond  to  w^  ,  ,  Wj  but  its  optimum  partition  is  different  from  that 

of  10,  11,  25,  40,  12.  However,  all  the  potential  h-arcs  in  all  the  n-gons  with  the 
same  permutation  of  weights  are  compatible.  We  state  this  remarkable  fact  as 
Theorem  6. 

Theorem  6.  All  potential  h-arcs  are  compatible.  ■ 
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Note  that  any  potential  h-arc  Vx-Vz  ,  like  the  one  in  Fig.  7,  always  dissects 
the  n-gon  into  two  subpolygons  and  one  of  these  sxibpolygons  has  the  property  that 
all  its  vertices  except  Vx  and  Vz  have  weights  larger  than  max(wx,wz).  We  shall 
call  this  subpolygon  the  upper  subpolygon  of  Vx-Vz.  For  example,  the  subpolygon 
Vx  -  ...  -  Vw  ^  ...  -  Vz  in  Fig.  7  is  the  upper  subpolygon  of  Vx-Vz  . 

Using  Corollary  3  and  Theorem  6,  we  can  generate  all  the  potential  h-arcs  of 
a  polygon. 

Let  Vx-Vz  be  the  arc  defined  in  Corollary  3.  The  arc  Vx-Vz  is  a  potential 
h-arc  compatible  to  all  other  potential  h-arcs  in  the  n-gon.  Furthermore,  there  is 
no  other  potential  h-arc  in  its  upper  subpolygon.  Now  consider  the  (n-l)-gon  ob¬ 
tained  by  cutting  out  V  .  In  this  (n-l)-gon,  let  Vw,  be  the  largest  vertex  and  Vx/ 
and  V_  /  be  the  two  neighbors  of  V  /  .  Then  V  #  -  V  /  is  again  a  potential  h-arc 
compatible  to  all  other  potential  h-arcs  in  the  n-gon  and  there  is  no  other  potential 
h-arc  in  its  upper  sub  polygon  which  has  not  been  generated.  This  is  true  even  if 
Vw  is  in  the  upper  subpolygon  of  Vx/  -  Vz  /  ,  If  we  repeat  the  process  of  cutting  out 
the  largest  vertex,  we  get  n-3  arcs,  all  arcs  satisfy  Theorem  3. 

The  set  of  h-arcs  of  the  optimum  partitions  must  be  a  subset  of  these  n-3  arcs. 

The  process  of  cutting  out  the  largest  vertex  can  be  made  into  an  algorithm 
which  is  O(n),  We  shall  call  this  algorithm  the  one- sweep  algorithm.  The  output 
of  the  one- sweep  algorithm  is  a  set  S  of  n- 3  arcs.  S  is  empty  initially. 

The  one -sweep  algorithm: 

Starting  from  the  smallest  vertex,  say  ,  we  travel  clockwisely  around  the 
polygon  and  push  the  weights  of  the  vertices  successively  onto  the  stack  as  follows 
(wj  will  be  at  the  bottom  of  the  stack). 

(a)  Pet  Vt  be  the  top  element  on  the  stack,  be  the  element  immediately 

below  Vj.  ,  and  Vc  be  the  element  to  be  pushed  onto  the  stack.  If  there  are  two  or 
more  vertices  on  the  stack  and  >  wc  ,  add  V^_^-Vc  to  S,  pop  off  the  stack; 
if  there  is  only  one  vertex  on  the  stack  or  w^.  £  wc  ,  push  wc  onto  the  stack.  Repeat 
this  step  until  the  nth  vertex  has  been  pushed  onto  the  stack. 

(b)  If  there  are  more  than  three  vertices  on  the  stack,  add  to  S, 

pop  Vt  off  the  stack  and  repeat  this  step,  else  stop. 

Since  we  do  not  check  for  the  existence  of  a  smallest  vertex  whose  weight  is 
strictly  less  than  those  of  the  two  neighbors  of  the  largest  vertex,  i.e.  the  existence 
of  the  vertex  Vy  in  Theorem  3,  not  all  the  n-3  arcs  generated  by  the  algorithm  are 
potential  h-arcs.  However,  the  one- sweep  algorithm  always  generates  a  set  S  of 
n-3  arcs  which  contains  the  set  P  of  all  potential  h-arcs  which  contains  the  set  H 
of  all  h-arcs  in  the  optimum  partitions  of  the  n-gon,  i.  e.  , 
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where  each  inclusion  could  be  proper.  For  example,  if  the  weights  of  the  vertices 
around  the  n-gon  in  the  clockwise  direction  are  wpw^,  .  .  .  ,  wn  where  w^  £  w^ 

^  ^  wn  ,  none  of  the  arcs  in  the  n-gon  can  satisfy  Theorem  3  and  hence  there  is 

no  potential  h-arcs  in  the  n-gon.  The  one-sweep  algorithm  would  still  generate  n-3 
arcs  for  the  n-gon  but  none  of  the  arcs  generated  are  potential  h-arcs, 

4,  CONCLUSION.  In  this  paper,  we  have  shown  the  one-to-one  correspond¬ 
ence  between  the  orders  of  multiplying  a  chain  of  matrices  and  the  partitions  of  an 
n- sided  convex  polygon.  Then  some  theorems  on  the  properties  of  the  optimum 
partitions  are  presented.  We  have  skipped  some  of  the  proofs  and  interested  read¬ 
ers  should  refer  to  ref.  8  for  details.  Based  on  these  theorems,  an  O(n)  algorithm 
for  finding  a  near -optimum  partition  can  be  developed  (ref.  9)-  The  cost  of  the  parti¬ 
tion  produced  by  the  heuristic  algorithm  never  exceeds  1.  155  Copt,  where  Copt  is 
the  optimum  cost  of  partitioning  the  polygon.  An  0(n  log  n)  algorithm  for  finding  an 
optimum  partition  is  also  presented  in  ref,  8. 
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